WebKit: Scraping HTML Data | Swift 3, Xcode 8

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everybody my name is Kyle Lee aka kilo loco from kilo loco calm and today we have a real a real interesting thing that we're going to be going over we're going to be going over web WebKit but more specifically scraping HTML data and to be even more specific than that we're actually going to be scraping data from a site that requires authentication which means that you have to like log in before you can get access to content the tools that we're going to be using today are Xcode eight point two point one Swift three and Swift soup one second I just want to go over why I'm still using these tools at this point WWDC has launched iOS 11 betas out Xcode 9 beta is out and Swift for I think maybe beta or whatever Swift for is out as well I'm not going to be moving over to those until maybe July so just be on the lookout most of the changes in Swift 4 aren't going to be huge code breaking changes so your Swift 3 code should work and so for if you're currently coding this just for so this should still be relevant or the conversion should be pretty easy easy we're going to be working with Swift soup which is a HTML parser and it just makes things easier for us to get a specific chunk of HTML the skill level i've set to intermediate now what i'm going to be going over is actually pretty easy but you know i mean obviously you need to know the Swift language basics and know how to create and use data models but the part that's a little bit what makes this video a little bit more intermediate is that you need to know how to read and kind of work with HTML because um you know if it's it's not easy to exactly do this for just any website what we're going to scrape data for any website also some understanding of javascript you have to kind of you have to be able to you know write your own javascript if you're going to be wanting to scrape data so that's what I'm just saying that it's a little bit more intermedia if you're watching this video you probably know this stuff so it should be a beginner lesson for you but for anybody that only notes knows Swift code this is going to be a little bit more intermediate you're going to have to look into some of the web related stuff by yourself and kind of look into that so yeah I'd like to give a shout out to Alberto Presti I hope I said your name right and Alberto this was not a easy video to make it was actually extremely difficult to find the information on how to do this but like I said to actually do it is not very hard one last disclaimer before we actually get into it the way that I'm going to implement this this this tutorial or this project or a poor tutorial whatever you want to call it whether I'm going to go about this is I'm going to show you the tools that you need to scrape but this isn't how you would exactly scrape the data like it if we're going to go through like a step by step process as opposed to having it be like an automated process and there's a couple reasons behind that and I'll get to that when we get there but I just want to let you know that this isn't exactly how you would implement it if you're trying to make a production level app based off of a website but you should be like I should be providing you with all the tools that you need in order to do do such a thing you know just obviously you know optimize your stuff that's all math that's all I'm pretty much saying that's all I'm trying to say you're gonna have to optimize these things yourself that's all I'm saying alright let's get into it we're going to obviously start off with a new project and you have to actually call it hack in the web because it sounds cool I'm just joking you don't have to call it hit that but hack in the web I mean it sounds kind of cool to me so yeah we're going to put in that KL Learning folder because that's where I learn you and then create it there we go and now we should have our project okay all right so let's go over to our main storyboard now once again like I said this isn't going to be how you would implement it in production but there's a few things that I want to go over so the pieces that you will implement in production most likely are going to be like a login page right so we're just going to add two text fields right one for the email or username and then one for the password so let's go ahead and do that it's not going to look pretty that's okay most of my videos the stuff never end up ends up looking pretty but you guys get something out of the ramp it's ugly but it works Oh does it work okay I'm just dragging and pressing options so I can duplicate those so we'll just put the placeholder in here for email right for here it'll be password right and then you'd probably have like a sign-in button right something like that sign in okay let's make a little bigger you know whatever and yeah okay so that's pretty much all we need for our for our sign-in page but we're also going to want to have whoops we're also going to have another webview are not a webview we're going to have another view controller and all this is going to do is it's going to present our data and we're actually going to be scraping data from Gmail I figured like that would be like a good place to start off so I hope that's good enough for you guys I could have done a table view controller but whatever this is fun on table view just go ahead and put that in there and then this one will stretch all right so what we're going to do is just you know pin it make it look decent I guess and that's all we need to do and then lastly I don't know why this happens I'm sorry about this guy's that random mouse stuck there but whatever okay so I'm going to drag and drop it right here we're going to do show that gives us a segue and we'll just call this show emails I guess show emails and I think that's all we really need to do for the UI so we're going to sign into Gmail right and then when we sign in it's actually going to take us to a table view full of our emails just kind of like how you would see it so it'd be like the the person that's ended sent it to you and then like the the title of it so yeah that's all we're going to be doing alright so what you're going to want to do is import WebKit and that's going to give us access to something called the the wk webview and the the webview is um you don't really need to be able to see it but I'm going to show it to you guys anyways and just show you what's happening behind the scenes of when you implement like a webview to do this like this hacking or this JavaScript injection or whatever you want to call it so we'll do let webview equal wk web view right it's my simulator running that might oh yeah we don't quit it okay so we're going to do webview right so we have access to it all so let's um get some of our outlets connected right because we need all that stuff connected so IB outlet we go var and then we have a email text field type UI text field right and then at I B outlet we revoir and then we have a password text field also type UI text field and then we also have an action so we have at I B action func on sign-in tapped and we're going to do stuff with that okay I spilled everything right exclude strip and I know what I'm doing okay so now we're going to move over to the main storyboard we just want to connect these things so let's go ahead and connect them I've got the email got the password and got the button action touch up inside there we go and this one will be our emails VC so let's go ahead and do that while we're we're still working with UI stuff right new file cocoa touch class I'm going to do a view controller UI view controller no no no no no no no no I don't know what you think you're doing that's not what I wanted to do alright so we'll just call this emails view controller VC and then this is going to go in there right and we don't need this we are going to need an extension of it wait are we going to need an extension yeah we're going to need an extension so UI view controller and no no no yeah no no no no oh yeah you know yeah I'm sorry guys of UI text field delegate right not text field delegate Wow Kyle what is going on tableview datasource my god a number of sections and rows cheese returned one number of rows in section we're going to return 0 for right now just so that there's no complaints and then cell for row we're going to do we're going to let cell equal UI table will view cell and then we want to have this as with a specific subtitle a style so that we'll do dot subtitle that go Jesus okay so now we want to be able to access the cell the text label dot text is equal and we're going to just leave this blank for right now and we'll do the same thing for the the details text label text and then this will also be blank for an F right now and we'll return so okay so we haven't really done anything yet now I'm right you're wrong I'm right and you're wrong I'm big and you're small nothing you can do about alright so drag that delegate Irving that data source and I think UI is all connected and I think that's all we need to do for you I so now we're going to move back to our view controller which I would usually call sign and view controller you know what let's just do that so like doing things right for it right now even if I can't talk all right now I'll just leave it we'll just leave it because everything is already connected don't don't break anything Kyle just don't just leave it just leave it okay it's fine okay I'm sorry guys alright so what we got to do first is we have to set up a request to be made right so we in order to do that we have to start off with a URL we'll just do let URL equals URL and we're going to initialize this from a string right and remember that when you initialize the URL it's automatically optional when you do it from a string so we have to do exclamation point and then all we want to do is want to do Gmail so HTTP s dot colon slash slash WW Gmail komm right and then we want to turn this URL into a request we'll do let request equals URL request we'll initialize this with the URL our URL obviously and then from here since we have our web view right here already and initialize will do on web view dot actually we want to I want to show you guys what's going on in the background so I'm going to actually give it a frame so that we can actually see it so we'll do what view dot frame is equal to CG rect or let's see yeah yeah cg rekt and i think i had it for like 300 looked like a good size so i'll do 0 and then we want it to be a little bit lower so we'll put it like 300 and then we'll do like 300 wide and 300 high right so that's not required you don't have to be able to see the web view for this to work but i want to show you what's happening throughout this process all right with the web view all we want to do is we just want to load that requests a web view dot load request we'll pass in our request and then we also want to add it to the sub view right so that we can see it we want to add this as a subview so we'll do view dot add sub view and then we'll pass in on a web view this part also not required you don't have to add it to the UM to the view as a sub view we just want to be able to see it so that you understand what's happening in the background not just like magic Kyle's computer magic they don't do that here we explain things here well I try to explain things here you guys watch and then you try to understand what it is that I'm trying to tell you so a lot of trying going around here oh my bad oh man I almost forgot that you guys watch this on your phone sometimes okay so it's a little bit bigger hopefully you can see that nothing too crazy yet we're just making a little web view right just making a little web view making a little whimsy that's all we're doing didn't even realize I was over here being selfish with my small little text it's running like Forrest okay so now our view will pop up and then once it loads it'll actually pop up it should pop up somewhere right here if I did it correctly yeah see okay so this is our little web view once again not mandatory and as you can see right here if I could if I could scroll you can see this is our little login now Gmail is a little bit different you know they make you put in an email and then they make you submit a form and then they make you put in a password and then make you submit a form and then you're actually presented with the content that you wanted to see so we're going to go through the Gmail process most most emails that I'm familiar with don't make you go through this but it's good it's a good practice thing to kind of go through so we're just going to deal with their process now what we have to do is since there's multiple steps and I said that I'm not going to make this automated I'm just going to show you the steps to do it I hope you guys are very able to understand the idea of a counter I would I would hope so you know counter is equal to zero and then whenever we go through a step we'll just increase that counter by one so that whenever we we tap on the sign in will do plus one and then we'll will go into a different step so we'll do count plus equals one right once again this isn't how you would do it in production but step by step so what we have to actually do is we have to go over to Gmail and we have to take a look at their HTML all right I'm back um I had to change my password because I didn't want to I wanted to make sure that you guys couldn't access my password and stuff like that so that's all I had to do okay so we're going to move back over here and we're going to go to you know gmail comm it takes you to this page gmail.com right and this page comes up what you're going to want to do just in case because um things get a little bit weird is if you're using Safari what you want to do is you want to do go to user agent and you want to select a Safari iPhone because you're most likely developing for iPhone and the reason that you want to do that is because what pages look different sometimes you'll see the mobile pages and the HTML will be different so the the fields that you got to access have to be different so what you want to do is um you want to inspect this element you would normally inspect this element it's not going to work exactly right here but this is what I tried so you would inspect the element as I did and give you back all this information and we can see that the type is email and this gives us some of the information that we're going to use and it says that the ID is identifier ID and I just wanted to show you guys this process because this is kind of how you would normally go through depending on whatever website that you're trying to scrape this is how you would do it you'd need an access token in order to get in and do all this stuff but now we know some of the information about the field that we want to access alright so now we're going to move back over here and the first thing that we want to do is we actually want to want to we want to do a switch based off of our counter so switch and then we're going to do counter now case zero which it starts off at zero do we'll just do break for right now we'll do breaks we're going to have five different cases now we're going to have four cases and a default so let me show you guys that so case and then break and then these will just be our placeholders right guys right and girls not to be sexist or anything case three break case for break and default break okay so once again this is the part that's mainly not how you would implement it you wouldn't tap a button five times in order to sign in but this is how I need to show you so that you can actually use this information so with the WK webview the reason why we're not using UI webview is because WK webview is a newer api / framework that apple implement apple implemented it's faster it's it's similar to Safari and it has more functionality one of the functionalities that WK webview has is that it can evaluate javascript right oh I'm sorry I should let you guys see that evaluate javascript right here see so now it's expecting some type of JavaScript string and then it also gives you a completion handler if that was successful or not so you're gonna choose that first one evaluate JavaScript right and then we're gonna make it a string we'll get back to that part right now and then we'll do enter for the completion handlers and we're going to have some value right I'll just put a value right here and then error I'm going to change this value placeholder frame in a bit once we once you see what the JavaScript is but right now value just so that everything was clean so print um whatever alright so now we have code in them so this is the part where things get a little bit more complex if you don't understand JavaScript what you actually have to do for this piece right here is you have to put you have to do in string which means that you better make sure that you spell it right that everything is capitalized correctly and all that good stuff so you have to do your web development right so you do document doc you mint make sure everything is spelled right you want to get element by you want to get element by ID right so get element by ID make sure I spelled that right and um just a forewarning this is not exactly correct like this this I'm going to have to go back and change some of this but I want you to have to see the process I think it's really important that you see the process and then we have to do the single quotes to say what we think is called and it said what did they say the idea is it said ID is this identifier ID right so ID is equal to identifier ID we're going to pass this in and we're going to see what happens because I'm going to tell you right now this isn't going to work this is a piece that doesn't work when you try to get it from the website but that's what you have to do this is the process that you kind of have to go through to figure out how the website works value you do dot value is equal to and then you also want to do your single quotes and then we're going to do interpolation know you want to do string interpolation sisters trunk all this a terrible time and we want to pass in want to pass in our email right so we'll do email text field dot text and we'll make sure that that's unwrapped so that there's no optional in there and then for the completion or well I think that's is that it yeah that's it okay so we're going to try to do that and we're going to pass in it we want to pass in our value you want to print out our value or we'll do print come on print oh my goodness this became a whole lot more complicated error out of my way okay so we're going to try to do this and we're going to see what happens so either it's going to print the value a well either we'll just see what happens I believe values going to be nil and I believe error is going to be coming back as something I don't remember exactly how it went when I did this but I just wanted to specify I wanted to show you guys that this ID even though it's in the HTML that's that it's not correct so wait for this to pop up so we can see it so we do Sign In and nothing happened okay right we were expecting what we what we wanted it to do see look at the value is now like I said error and then JavaScript exception occurred blah blah blah knows not it's not an object or something like that like it couldn't find what you said that the ID was right so what we actually have to do is we have to do we have to do a different call in order to figure out how the app is seeing the job the the HTML and then we have to go in and we have to look for the ID that way so this is what we're going to do we're going to change this JavaScript we're going to rewrite this javascript in this in this spot so what I'm going to do I'm going to just comment I'm going to copy it will copy it I'm a comment this piece out for right now because we do need to have that line right there but we just have to figure out um what a what ID we actually need to get so one thing and we're going to do dot document right and we're going to do get elements by tag name remember all this stuff has to be spelled correctly we're going to do single quotes lowercase HTML then we're going to say the first the the first object which is zero zero index right dot inner HTML so now we're going to say print we're going to do the value we want to print out the value you want to say inner HTML just so that we know that it's the inner HTML and we want to print out the error just so that we know the error so now let's go ahead and do that again I would show you guys straight away but if you guys tried to use this in real life it's not going to work as easily as I just say okay you just do this this and that like it doesn't work like that and I just kind of want to show you guys the process so we want to wait for it to pop up okay so now I'm going to do sign in and then see that's a lot different see this is see our errors now now we have our HTML now you have to go back and you have to yet the parse for all this stuff right well let's use what we know and we'll do it based off of what we see in here so what we're looking for we want to know the ID for this now we know that HTML controls the way that the the website looks right so that means that the HTML has to have this string in an email or phone so what we can do is we can search for that and I'll make our herb it will make it a whole lot easier we don't want to do filter we want to just do search so so do command F right here make sure that this is the logs are popped up and we're going to just do email or phone right and it's saying right there one result it's if this is where everything is happening now notice here where it says input ID is equal to email right we thought it was supposed to be identifier ID but it's not so this is why we have to look for email so what we're going to do is we're going to see look it says email and then for Gmail it has all the stuff in in place already right there so the ID for the password is password P password it's tax WD - hidden right now because it's hidden when we submit the form the email becomes email - hidden so we'll just copy this right here both of those inputs and we'll save those so we have email and pass WD as the IDS that we want to search by so now we're going to lower this again so that we can see our code again and sorry for you guys that are watching on the phone I know that you you can't read something that small but I hope what I said makes sense and when you get the chance to do this on your computer that you're able to go back and do that so let's just put it in in comments right here so that we can access them and always remember that you want to do you want to do copy and paste when you can just to make sure that you're not missing anything so we'll just do email and then I've gone through this I mean you could you could do the same thing once you get to the next forum where it presents the password you just got to go through that process again and I don't think it's that important to show you the process again you know you kind of just have to play with it because every website is different anyway so me showing you that wouldn't really benefit you but I think me showing you trying to work but just based off of what a website gives you on a computer versus what it gives you on the phone you know things can still differ even if you change the device to say oh I'm a I phone but it's really not so now we're going to get rid of this right because we got the innerhtml and you know what I'll just comment this because we're going to need to use um this JavaScript again later though and I'll just put this line of text up here so that we have it for later okay so on the first round we're going to do remember we want to just pass in this email right here right because that's the true ID so we'll do email and then we're going to say email all right so now let's go ahead and run that and if we do it properly then we won't need the completion Handler it will just be nil because this this is going to run code that says ok for that tag for the element that has an ID of email which is the the email field right we want to we want to make the value whatever we have in our text field so we're going to whatever we put in our text field it should be that which part did I get wrong or is Expo trippin Xcode trippin okay so let's go ahead and run that so now I have when I actually enter in email am I in my text field for the email text field I press submit whatever is in there will be in it'll be in the field will see watch it makes more sense when I show you everything that I do does but okay so my email and remember if you ever trying to reach me don't reach out to this email because I don't use it at gmail.com and when we do sign in it should pop up right there because we're injecting JavaScript into the the webview which allows us to you know manipulate it so I'm going to press sign in and look at that gmail or re Kyle Lee at gmail.com is populated in there and I didn't have to touch that I didn't have to touch any of this stuff okay so now the the next step is um to actually submit this form so it's like pressing a button but since this is a form it's not it's not when you press next is not really pressing a button in web development I don't really fully understand this but it's submitting a form so what you have to do is you have to inject JavaScript to say submit a this form that I'm talking about in particular so now what we have to do is we have to find out what form this is in so now what we can do is we want to go back over here and we want to find form so you're going to in your search you're going to want to type um you know the angle bracket and form to make sure that it's not in form or if it's filled anything else and you want to just do search wait did I do that right actually I didn't do that right don't do it like that don't do it like that don't do what I just did okay so what you want to do is you want to just do the command F right here yeah this is the one that you want to look for so form like this so now we'll get the all the forms that match this type of thing so this is one form supposedly I don't know what this is I don't know that let's bringing up that but this is another form so when you have a let's see if it's even in here you know what I might have to do the thing that I just did I can't remember how I got the form name exactly okay you know what let's just do this let's run this again let's copy this so that we can see our HTML and we're going to we're going to close it like that and we'll print the value okay so now I have to go back it's a process guys you have to not fun or easy the process but being able to do this type of stuff is pretty cool because what you're going to want to do is you want to be able to submit a form but you have to specify which form that you're going to be submitting so I thought it was going to be able to show in the HTML from from the website but obviously it's not doing that so I don't need anything to be put in here because I just want to be able to print right so that would fit whatever value I had into there and now when I press it again we get our HTML back and now we're going to do that same search button here so now we we want to do form and we only have one right okay so what I did is I kind of just assumed that this was the form that we were talking out the form that I needed because I mean if you look at the action it says this right this is account Google comm slash sign-in slash of e1 flesh look up right sounds like it all so the idea is GA ia underscore login form so that's probably the one that we're looking for right I had to take a gift I had to try these things out and it turns out that this is the actual ID of that form so I kind of looked out once again it just all depends on the website that you're working with you know there's so many different ways to do websites that it can be hard and whatever like you can't just do blanket statements you can't assume they're all two names of the same naming conventions you can't assume none of that so yeah so next what we're going to do is we're going to do web view and we have to inject JavaScript so we'll do evaluate JavaScript again right and then remember that after we after if we are injecting successfully than we just do Neil we don't need to check back for anything else right so I'm going to just do now so we'll do doc you mint and the what you have to do is you have to get element singular by ID and then we're going to pass in in single quotes our ID which is this guy a login form or whatever it is right and then since it's a form we can say dot submit with the open parenthesis close parenthesis and a semicolon so if this works then what what it will do is it will submit the form which means that we can proceed Salviati type in my email type in or I could type in my email press sign in that will populate the email I'll press submit again or and again and then I'll submit the form and then it should pop up the password field so let's go ahead and try that yeah so when you look at it the codes going to be actually really small and yours would be even smaller because you would automate this and you would do like the completion Handler maybe maybe you chain all these inside of inside themselves but yeah so I'm going to start typing to my email while it loads Kyle Lee at gmail.com right so we're going to do sign in that should populate the field right it did we do sign in again that should submit the form see now it says password we're in so now all we have to do is we have to repeat this process and since we already know the ID for the password we'll just copy and paste these next the steps again so we'll do copy and we'll remove this break we'll paste it in right here and we'll use the correct ID obviously so you just paste this in you want to make sure that you change this to password right so the password text field and then we want to submit the form again all right so now I should be able to fill out my form in the app and then when I go through these steps will be able to watch it watch it fill out everything every time we press it to sign in something else should happen okay now I decided to go step by step and make it so that you have to press sign in multiple times for it to work and the reason I chose to do this was because if you automate the process too quickly then you get um you get captures which are those pictures that they say Oh am i I wrote well first it's a checkmark box right this is a Maya robot and you have to click it and then you know maybe it'll be able to tell I'm not a robot or I'll or if it thinks that you still might be a robot then you get a capture screen and then you have to deal with CAPTCHA so I'm not going to get into that because that's like getting really deep and this thing gets really deep really quickly and we don't want to do that but you know just be aware that if you do things too fast websites are are coded nowadays to check for for if you're a robot because you're not supposed to be scraping websites like this it's not like you're legal or anything but you know I don't think people like it so re cloudy calm or whatever I said the password is straight me one two three so now what we'll do is we'll do sign-in that fills in our email we'll do sign in again that submits our form now we're at the password so now we're going to do sign in again it's going to populate our password we're going to do sign-in one more time and that's going to actually submit the form for the final time so now we're moving forward alright so I got to this screen right you're missing out now this is another thing that you have to be wary of is that this will happen sometimes they'll have something like a little pop-up now you can either check for the pop up to see if that's what's coming up what we're going to do to get around this pop up because it's like if I if I run this again if I run this entire thing again well first of all it'll keep me logged in the app will automatically source the cookies um I mean obviously you can change all that stuff right but it's going to automatically log me in to let me log out and go through the process again so we're going to go to if I could remember more and we want to man that does not like interacting with there we go oh now wait look at this that I wanted Oh little thing right here now got a sign out I don't know why the hell it doesn't let me sign out oh boy Oh probably that drop-down right yeah a sign out of all accounts right so now what we want to do is I want to go back sign in with a different account I want to remove so that it's like we're we're back to normal right so now if I go through this process again you'll notice that that pop-up won't come up the one that says you're missing out get the Gmail app blah blah blah right so what we're going to do is we're going to just sign in again I just want to show you guys that because I wasn't sure how to handle that but then I realized what it was doing so scrape me one two three so we're going to do sign-in right submit the form put the password submit the form and now it won't give us that pop up watch see one straight in okay so what I have to do is I have to stop the simulator I actually have to delete this app in order to get it back so that there's no um I could figure out how to do cookies I didn't want to look into it because I don't really care to get back deep into this type of stuff um it's not it's not really for me I have to deal with like web development and that's I don't like doing web development okay so anyway as you can see if it knows that you've been to that website before then it won't ask you to do that again which is smart UI right or UX our UX yet smart UX so all you really have to do is just refresh that screen and it will skip over it or at least that's that's what um that's my workaround for it so all I'm going to do is just do web view dot reload and then for the last piece um once we do that I want to actually see all the JIT all the HTML so I'm going to take this you don't need these anymore right already now I want to see all the HTML that's in there so that I can parse that HTML so we'll do inner HTML you call it whatever you want and we'll do print inner HTML so that we can see it sound good okay so I hope I hope this process is making sense so far you already got through the hardest part the hardest part is really just figuring out how to find the ID and then submit the form that was like finding out how to submit was like the hardest thing for me I don't know why it's so simple it's just you find the form and then do not submit but like that nobody tells you these things like if you don't if you don't have great development web development experience and it's hard to find this stuff it's hard to learn this stuff but whatever alright so we're going to log back in as mean right so our e Kyle Lee at gmail.com scrape me 1 2 3 we're going to go through the process right sign in sign in again sign in again sign in again now we should get that pop-up see we get the pop-up all we have to do is reload the page and it goes away it's one workaround and then we want to see what the HTML is so let me clear all this stuff out because it's a lot that comes back I'm sorry I thought we were going to use this a lot more than we did but we didn't use it so my bad ok so it's quite a bit of HTML as you can see it's still trying to come in a lot of stuff so what we want to do is I want to get the person that sent it to me which is this bolded thing and then um I want to get the little message steps that's with it and I'm really it's not gonna let me drag um that's okay we can just search based off of this so we want to get the the title so this is pretty unique new categorized mail Google would probably be a hard one swift evolution since I says I'm subscribed to that that comes in a lot that's also a hard one to kind of do so new categorize SML I think would be pretty unique so let's go ahead and search for that in our logs new cat to Gore eyes mail maybe it has to be capital there it goes Wow Xcode does not like me doing this [Music] okay okay I don't know what's going on with Xcode is out there there it goes it's kind of working it's waking up okay so we have new categorized mail that's one that's two this is like in a pretty unreadable area luckily for sure onto on Safari this part is the same so let's go ahead and log in as me right so re Kyle Lee at gmail.com password is score scrape me one two three see I guess I get the same thing see it kind of thinks that I'm on so we'll just reload the page okay so what we're going to do is we're going to use new categorized mail as something that we want to search for as being continued that's unique to search for so new let's split right new categorized mail all right so now we have access to that so what we want to do is we want to kind of take a step back and we kind of want to look at what's going on we want to be able to let's see we want to kind of be able to see you know what what class is this related to where is it at and we're going to let our we're going to let our swift soup actually handle that for us so give me a second and I want to go over that part so this is where the Swift Soup stuff comes in so let's go ahead and import Swift soup in case you forgot already well not already because it's been a long tutorial so it's no blame on you blame on me more likely what we want to do is we want to import Swift soup which is going to help us parse our HTML so go to Google and then while that's working I'll open up my terminal get it ready and we want to do Swift soup all one word okay and where is the where are the cocoapods well you know what I think that they forgot to put that in their examples how to import it okay it's not in here but it's just swimsuits so all you do is change the directory to your project right so do desktop KL learning and then hacking I sort of call it hacking the web right and we'll do pod in it and then we'll open pod file and then right here all we're going to put in is pod Swift soup save it do pod install and it's installing when in doubt spell it out all right so it's installed so now we can open back up our project if it lets us and obviously you want to open the workspace so now we need to create a new object that's going to work with our HTML response so I'll just call that Gmail response so let me just get that going new file and it's going to be a swift file and we'll call it a gmail response call it whatever you want and then this is where we want to import our Swift soup because this is what's going to help us parse it whenever it's ready when ever it's ready so yeah if you're interested in sub soup it could do a lot of things but it can't do what we have done up to this point I don't think it can at least it's just more for parsing that the HTML that you already got back so yeah and it's still indexing I'm sorry guys these videos take longer because Xcode takes longer force it to quit see what happens open it back up and then we want our recent project which is the XP workspace we'll open that up and hopefully it will work there we go so Gmail response we're going to import our slips our swift soup not really sure why they call it soup kind of weird we'll make this a structure and this will be called G Gmail response and then we'll initialize it will actually initialize it with any but we'll do we'll call it inner HTML it doesn't need an external name we'll just pass in the inner HTML and that will be able type any optional any actually and then we'll do guard let um HTML string what do we want it as a string yeah I think so HTML string equals inner HTML as a string else return and actually do I want to do that do I want to put it into let me double check how I did it because I want to do it the right way yes because Swiss soup it does a lot of its error handling through a try-catch block will actually make this throw so what makes this throw will add some type of error less it will just say some will just say what HTML error of type arm error and this is actually a numb don't forget to put that in there and we'll do case let's say that bad inner HTML so we'll actually throw that throw HTML error dot that inner HTML right throw the all right let's build that make sure that everything is correct and I believe that it is because I know what I'm doing okay so it's just saying that we never used it and that is fine with me okay so now that we actually have it oh no can't load what do you mean it succeeded I think it's I think it's just tripping I'm going to proceed as if it is working because it said that it's working but it's not working so let's just proceed okay so with Swift with Swift soup what we have to do is we have to get the document so first thing we do is we remember that all this is going to be in a do try block so we can just we can throw whenever we want we'll do let dock is equal to we have to do we have to try this try Swift soup very conceited they use their own name dot parse and we're going to pass in our HTML string so if we do that then we have access to the actual if this precedes it then we have access to the actual HTML it should work now what we have to do is we have to parse that HTML and we have to figure out how we're going to parse it so we go back to over here Safari and what we can do is we can get by class name so if you notice the most the most closely related class to this message is mg space JL don't know what the hell that means but if we go to a different one will notice that it has the exact same thing so we're going to we just want to verify that it has the exact same thing so if we go to inspect element in the mg - JL area if we can open it up enough make sure that it's highlighted see how Google's highlighted we'll do the drop down on that one make sure that Google is highlighted again and we'll notice that mg space capital J L and the text is that so I don't know how Google goes about naming their stuff but this seems to be the class that we want to work with so and we're going to we're going to call that person the author so we'll do let otter because it feels actual human then that would be their name will do try doc dot get element make sure it's plural elements by class we're going to pass our class in right there and we want the text from whatever that is from that from that class so remember to to minimize your chance of error you copy this you always copy this even if it's like two letters you copy it I don't care if it's one letter you copy I'm just joking but um yeah it definitely makes things easier to debug if you know for sure you're using the right classes okay so now I want to get let's say Swift evolution I want to get Swift evolution digest so that's like the the message that with Ariston to me or actually let's get the longer message which one's better um you know what I kind of want to I'll kind of want to just get the titles so we'll just go for the title so just checking up on your progress right so inspect element and we'll go and we'll wait until just checking up so this is the one that's highlighted so I'm going to drop down until I see it highlighted again drop in you know so it looks like the class for this one is mg oh no no no no see this is why you double clicking you copy you see that space that would have thrown off everything space mg space kale copy that so that's going to be our title let title equal try and we're going to do the same the same exact thing doc that elements by class and you paste it in see that space that could mean everything and we get the text from that right so now we have to iterate through both of these or actually we'll just iterate through one of them let's just iterate through the titles or authors will iterate through one of them right so for since we're going to have to iterate through button will do i4i in zero to title dot count okay elements by ID hold on let me see exactly how I did this okay so these are plural right because it's getting elements oh it's elements okay my bad okay so element I mean what we call it is a little I titles dot I'm doing this a little bit backwards with my bad guys I'm sorry so we're getting elements we're getting multiple and we actually want to make sure that this is an array so that we can iterate through it so we'll do array I'm sorry my bad I'm sorry if I'm if this is confusing now but we want to make sure that we're getting our array back because we're getting multiple elements and we want to be able to go through those elements so there should be titles dot count and then we're going to loop through this so now we should say whenever whenever we do an index of any of our authors it should be just one element which we can access so we'll do let otter equal um authors and we'll pass an eye for whatever index we're looking through at that moment and then for this element and then we'll get the text get text I don't not get it'll just be text I think yeah okay this is what we wanted to do I'm sorry I'm sorry so let title equal titles whatever index were at dot text okay so now we want to print out um our author we want to print out our title and then we want to print out just a space okay so one thing about doing it this way is when you're iterating through you have to be very very careful why what does this on assume that it's okay for it now this should be fun oh you have to try this too my dad try and try because both of those throw what I was saying is you have to be very very careful about how you do this because you don't want to iterate out of range for something because let's say for whatever reason there's one more author or something I have to end or there's one more title at the end but there's no author or title or or there's like a mismatch in the count that will cause your app to crash and then that stuff happened so because your app crashed okay so now we're going to go back to our view controller and we're going to pass in this inner HTML and we want to just make sure that all this is working all this work has not gone to waste right so I don't know what makes it all small again I wanted to be like taking a little screen okay so we remember that we have to pass it in a do try block because um so if soup automatically does error handling to do try and we made our object object ethros so we want to do let let gmail response equal Gmail response and we'll initialize that with inner HTML and then print if we're able to initialize that and we have to mark it with try remember try and then we'll say got response or something like that right and then only things that should be printed out after we go through this process are the author the author names which is going to be like Google or swift strip revolution or whatever it's going to be or tree how team tree house it's going to be those and then the message whatever whatever the header or the title of their messages all right guys welcome back all right so what ended up happening is because we were in the EXCI workspace it had automatically set it to iPhone 7 plus I don't know why so when I have OBS running it can't open up another simulator or else it never finishes launching for whatever reason so all I did was I just delete the app reinstalled it well I restore I turned off OBS so that it could open this simulator again I deleted the app reinstalled it just so that we can login and we can watch the entire process unfold up into this point again so let's go ahead get rid of all this and let's get back into it ok so once again email Ari Kyle ar e Kyle Lee at gmail.com and then password is great me 1 2 3 and we'll do sign-in that will populate submit populate submit we're going to see that that one ad for the the Google Gmail app whatever we're going to refresh the page and then we get the regular sign-in from there remember that we took out the innerhtml so now when we do sign in it's going to create the response the Gmail response and it should only print out our authors and titles so let's see weapons took a little while but it did it whoa please don't stop please no no no no don't open up more stuff okay so notice title message right so now we have we're parsing all the data that we want so now we can actually start moving into our actual other view controller and we can pass we can what we have to create a new object actually we have to create emails right so we'll do a new file we have to create a new data model and we'll call this email the email Foundation is fine dropped called email and then we will initialize Oh No we don't even have to put that in there do we know okay so we'll just do let title or no will do author because we did author for sure author I'll type string let title type string so now we have an object and we actually have all these so we want to actually create an email so let me go and then we want to actually be able to access that from the response right so we'll do let emails be an array of email and then we also want to be able to get it outside of the loop so we'll do VAR email of type email or actually we should probably make it just in case like you know something happens for whatever reason it won't bomb this thing you know good practice emails dot append email and then we set cells emails to our internal emails so now we'll have our emails and we can access those emails from the response so we'll do emails so now we want to go back to our viewcontroller and what we want to do is we say we got this right so now we're going to do perform segue I think I'll call it show emails the sender we're going to do our Gmail response dot emails that's what we want to send right now we have to be able to prepare our segue or prepare for segue and we want to be able to pass that data that we want to pass the emails data you know to whatever you want to pass it I think you know what I'm trying to say so we do guard let emails equal sender as any object this is for Objective C bridging as an array of emails else return and then what we're going to have is we're going to make sure our email VC has something to receive those emails so we'll call it emails as well our emails also type the email array of email oh actually we want to do we want to know we don't need to we'll do number of rows in section it will be emails dot count right and then we want to access the properties properly right so emails dot none of emails with this specify which index so index path dot Rho dot author for the big one and then emails index path dot Rho dot title for the little one okay so now I don't think we need any of this do we need them if you did loan I don't think so I think this is all we really need honestly yeah I think this is all we really need this and we have to make sure that we're passing it so we go over to our viewcontroller so now we have our emails we want to let we want to make sure that the destination view controller is emails VC so we'll do , let emails VC equal segue dot destination as emails VC else return and then all we have to do is just make sure that we pass that data so we'll do emails VC dot emails is equal to emails sounds extremely confusing but it's okay if you made it this far then you probably know how to do this this isn't a closure make sure we do so make sure that our storyboards are set up properly let's see alright I'll stop the simulator that's not helping anybody that ain't helping nobody alright so if everything works we should be at with our final product and it should be showing the emails after this all I have to do is make sure that this is called shows show emails right that's all I really need really need to check show emails right and have to make sure that this is of type email BC emails BC which I don't think it is so emails BC and then also make sure that the data source is connected it is I think that will work so now what I'll have to do is I'll have to tap the Sign In button like five different times even though that will already be signed into gmail and then once that happens on that last time what will happen is it should perform the segue and then all the emails should populate so once we see that the webview has refreshed because that's just the second to last step we'll know that we're ready to go and once again you don't need to have this showing and hopefully it will load a lot quicker for you as well so put one two three four five that should refresh it right so now in the next one it should be able to parse the information it will perform the segue and we should get our emails you see it takes a little while for substitutes as parts I'm not really sure why it takes so long but yeah and notice we have all of our information and look at that and that is how you parse data okay I've been sitting at this table for an hour and a half I've been going through technical difficulties it's Father's Day I'm supposed to be with the supposed to be with my family so I'm going to leave it right there you know about all the stuff join the slack reach out to Twitter reach out to me on Twitter and on fate YouTube and all that good stuff alright if you have any questions make sure you reach out to me I'm always here also make sure you go check out my update my channel update video because I'm going to be going through some stuff and I want to make sure that you guys know what what my schedule or what my plan is moving forward alright I hope this was helpful I know the long video but hopefully you have all the tools that you need in order to do what it is that whatever it is that you want to do all right my name is Kylie aka kilo logo from kilo logo calm and as always remember code passionately
Info
Channel: Kilo Loco
Views: 24,248
Rating: undefined out of 5
Keywords: how to scrape web data swift, how to inject javascript swift, how to use webkit swift, how to parse html swift, how to use SwiftSoup, how to fill out a web form swift, how to press a webview button swift, what is swiftsoup, what is webkit, website parsing swift, website manipulation swift, form manipulation swift, intermediate swift, javascript inject swift, wkwebview swift, webkit swift, html swift, scraping swift
Id: gscuaUSkxnI
Channel Id: undefined
Length: 84min 15sec (5055 seconds)
Published: Sun Jun 18 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.