Python 3 Programming Tutorial - urllib module

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everybody and welcome to the Python 3 tutorial video in this video we're gonna be talking about is another one of our standard library modules and that's going to be URL Lib the idea of URL live is it allows you to buy a Python access the Internet so just like the Internet it allows you to do all sorts of amazing things URL Lib is going to let you do all sorts of the same amazing things only using Python in your programming language so with that let's go ahead and get started so there's only a few I guess a few core things that you need to do in order to connect and get data from the internet but then there's a slightly more advanced topics that we do need to cover with URL Lib but we'll get there it's luckily still fairly simple modules so with that let's go ahead and get started the first thing that you're gonna have to do is you're going to need to import URL Lib now if you're coming from pythons u-7 you're used to just needing to do import URL Lib or import URL Lib - and that's it whereas with Python 3 and onward you actually have to do you're going to be more so doing at least import URL Lib dot request and when you do like URL open for example you'll have to do URL Lib die request dot URL open and so on but anyway more on that and a little bit so an example of visiting a web site will be as follows so you let's say what the final variable is X we'll say x equals URL live dot request URL open and then in these parameters is where we specify the address that we want to visit you always have to lead this with HTTP or HTTPS so for example HTTP colon slash slash and let's go to wwg I'll calm okay and then so that what that's going to do is going to make a request to that URL okay and this is by default it will be a get request so it's going to get some data and that's it now what we can do is we can say error let's do print X dot reads so we're reading the request so we can now save and run this and this is our output just a whole bunch of you know gobbly text but this is basically the source code of google comm so for example we could open up a browser and we could go you know to the top google.com hit you and control you rather and this is the source code right so again it is just a bunch of junk here right you get the idea that this is what we've done is we've used Python to reach this page okay so we can minimize that let's go ahead and close out of that too and naturally as time goes on we're going to cover that very soon but when you visit a URL you're going to need to parse that page a little bit so you're not as interested in the HTML as you are in this like paragraph text for example you're probably gonna only care about paragraph text so we're gonna have to show how to how to handle that and actually how to handle that we'll be using another standard library so have no fear we'll be covering that very shortly so the next thing I want to talk about is post so for example if we were to go to let's say we go back to where we were and we do the following let's say we want to go to Python programming net and that's where we can get all of our sample code if you're not familiar but if we scroll down to the bottom there's actually a search bar here we could search and let's say we search for basic ok and you get a bunch of search results for the keyword basic but if you look at our URL you see that we have some extra stuff added to the end of our Python programming net so what do we have well we've got a question mark then we've got a character s then we've got the equal sign basic and then and we and then an and sign and then submit an equal sign and search so with a little bit of deduction and reasoning we could assume that submit is a variable and s is a variable and they've been defined as s equals base sic and submit eagles search and that is true so if you look at variables or at least links that have variables in it the first variable will have a question mark and then the variable name equals and then all subsequent variables are going to have this little + sign in the variable equals and it continues on like that so that's this is an example of a get request we're getting data based on these well actually it's a post right we're getting data based on these posted variables so let's say we want to make a post request now first of all you could just get do a get to this URL right you can just use a request and put in this URL but the other thing that we can do and the more you know pythonic thing that we're supposed to do is go to python program net add in these values and do a post to Python programming net so let's go ahead and show how that's supposed to be done so here I'm just going to comment this out because we don't need to be printing that out every time just wanted to show an example I'm also going to comment this out because otherwise it's just going to be visiting every time and we don't need to do that now we're going to import another thing and that's going to be import URL Lib dot parse and this is going to help us parse values to our post requests so a little bit different this one's going to be a little bit longer to get data but just bear with me so first we're going to say URL and this is just going to be what URL what's like the base URL that we want to visit and the base URL again always lead in with HTTP colon slash slash we're Python programming dotnet that's it then we're going to have a dictionary and we're going to say this dictionary is called values and then empty parameters for now but very quickly we're going to say s and that s corresponds to basic and then comma and then our next value is if you recall submit and that submit was called search so don't know if a closure not know we didn't okay so our first thing so it's basically keyword key and value like a dictionary our keyword is the variable and then this is the variables definition basically so that's values now what we do is we come down and we're going to say data so this is going to be data from the website equals URL Lib dot parse dot URL and code and we want to encode values so first we're just encoding simply values and what URL encode is going to do is this going to encode it as it should be in the URL so for example if we go back to where we've been working and we go to like google.com and we did a search for um hey check that out okay you see that it's hey plus check plus that plus out you could also do the query is this hey check that out at least usually it's not fit there it goes okay and you can see that it has changed now - hey percent twenty check percent twenty that percent - what was that let's URL encoding percent twenty is a the encode of space okay and then obviously you would need to have URL encoding for like a question mark and so on so anyway back to what we were doing and that's kind of why you want to do it the official way rather than hard coding because as soon as you introduce ant signs or question marks python is not going to necessarily know which you meant so that's why you want to do it the official way that we're doing right now so first we encode values okay so we've encoded values as the data that we want to post in then what we want to do is we're going to say now data equals data encode and we want to encode this as utf-8 this is just a type of encoding okay so it basically puts your data in bytes now the next thing we're going to do is we're going to say rec for request equals URL Lib dot request dot capital regret request little bit talk what do we want to request the URL and then data so first the URL then any of the data that we want to pass through and we've encoded that data and we've encoded it under utf-8 okay so we're going to request now from this URL python program at going to pass the following variables s equals basic submit equals search and then after rec we're going to say recessed for response equals URL Lib dot request dot URL open rec ok so now we're actually URL oppressed out URL open we're actually visiting the URL now like we did right up here so the syntax is identical and really we just see that we had to do all of this first so anyway we've done that then we're going to say RESP data equals RESP dot read and then let's go ahead and print RESP data so we're going to visit Python program net we're going to pass through those variables and then we're going to read the results and again the results are going to be like the source code of the results so it's going to be a little messy but hopefully we'll be able to read a little bit here so we'll save and run it apparently sometimes it takes a second to like run there we go so it visits it and then here is the messy junk that we get pretty big mess I'm hoping we can find something that's actual tech yeah so here's some text here you know a paragraph Python 3 basics tutorial if something do something this tutorial is a part of the Python 3 tutorial series for beginners and so on so there is you know there's some content there we're not quite sure yet how to pull that content out but nonetheless we did visit the website we downloaded all the data so what your browser does for you in what HTML does is HTML basically tells your browser how it should display data but that's really it so your browser handles the HTML and makes it pretty for you and it separates tags from text and and organizes things whereas with python python is just going to look at the source you have no organization here right so now um what I'd like to have us do is change one more thing and so here we're making a request right so we've done a request so we've done a get really this was a get very simple just because it's that's the default so we didn't have to change anything and then we've made a post and we've you know made that post based on data that we've decided to set but now we come to a problem that is something that will you'll meet fairly soon come across and that is whenever you want to visit a website using Python or any programming language sometimes the website owners don't like that they don't want you on their website with a robot or a program or whatever they only want real users on their website so they will block you if they sense that you're not a real user now likely for us this is actually somewhat easy to fool basic systems anyways there are some more complex ones Google just recently made another update and has made it slightly more difficult to cheat their system but still you can overcome these things usually fairly easily it's almost like I don't know some sort of filter right if you're not good enough you can't use it but if you're good enough I guess you can you can use their services still with your program um that said usually websites that block your access they do it because they offer an API and they want you to use their API Google offers an API so try to use Google's API before you cheat Google try to use Wikipedia's API before you start cheating Wikipedia and just you know programming way around it because the API is going to make it easier on Wikipedia and it's going to make it easier on you too because they don't need to send all of the HTML data they don't need to send serve advertisements right because your program isn't going to read it that kind of stuff so you do have to kind of pay a little bit of attention there as far as if they have an API or not now moving along now we're going to need to I guess we'll show an example so for now I'm just going to comment this stuff out so we're not doing that over and over and now I'm going to come down here and first let's go and make a try and except here we're going to say try x equals URL Lib request URL open and URL we're going to tent to open is HTTP colon slash slash WWE google.com slash search and then question marks so we're defining a variable here we're going to say Q equals test so Q stands for query for Google so we're going to attempt to visit this URL so this is a search request for the string text so this is as if you would went to Google you typed in tests in the search bar you hit enter so we're going to attempt to do that so now that we've done that we'll come down here and we'll say save file equals open actually you know I don't want to do this I am positive this will fail so now instead what we'll do is we'll just do a print X dot read that should be enough and then we're going to come down here and we're going to say accept exception as E and then we're going to print string e so what we're going to do here again attempt to visit Google do a search query then we're going to read the results the source code of the results and we're going to try to do that otherwise we're going to throw the exception as E and then we'll print the string version of that exception so let's go ahead and run that and just see what happens to us so we run that and we get HTTP error 403 forbidden we're forbidden because Google says hey you're a program and we're going to go ahead and say no okay so if you happen to find yourself in this situation here's how you get around it so we try to accept that we fail now what can we do so let's make some more space and now let's switch this up a little bit we're going to do try and I guess what we use some of the same code up here but we'll retype it so we're going to try URL equals and we'll use the exact same URL so I'm just going to say URL equals this so copy paste the URL now we're going to go ahead and say headers equals an empty dictionary and what headers our headers are white basically what the data you send in you send in a header every time you visit a website and it contains information on you who you are your IP your browser your like operating system all kinds of stuff sends it a bunch of information on you and so within your headers there is a data piece of data that is called user agent so now that we've got headers defined let's make some more space and we'll say headers and then square brackets to define a piece of data in this dictionary and we're in a college piece of data user - agent so user agent is the type of browser basically that you are using so in our case what a Python does is it says Python - URL Lib slash and then your Python version so for me it would be three point four so within almost an instant when you visit a web site with python using the methods that we've shown so far that website knows exactly where you are they know you're a program so it's very easy for them to shut you down because you send in you basically say hello knock knock my name is Python right and and they and then their servers say hell no and they shut you out so now what we want to do is we're going to say user agent and the user agent that we're going to use is a little long I don't want to have to type it out so I'm going to go ahead and just copy and paste it like this and I'll just put a link to it in the description if I happen to forget someone remind me but hopefully I want so paste and we get this long user agent here you can't even really see it all in my window but this is it okay so super long user agent basically this tells well this acts like we're using mozilla and then it gives all this other information and all this compatibility stuff and so basically just say it just changes that we're no longer are we announcing ourselves as pipe that I have no idea where I left off I'll just kind of start at this point here turns out my dog knows how to open a sliding glass door so he was running around in here when he shouldn't have been that was very surprising anyway yeah okay so here we're just we're replacing our user agent and so in an attempt to fool Google so we'll see if we can so there's our user agent now we're going to do basically the same thing we did before and we're going to say ret equals URL live dot request dot capital request and then we're going to make the request to the URL and then we're going to go ahead and remember before always at data well we're not going to pass through any data here because we're like hard coding it in right under normal circumstances you would maybe say like if we're making a post request we could do that and then we would add in the whole search data or the values and make the post button so we're just going to hard code this for now feel free to mix them on your own time homework assignment URL and then we're going to say headers equals headers okay so we're telling a Python now to visit this URL and instead of setting our normal headers the default parameter headers we're going to change these up and call the headers this now in my opinion it just kind of makes a little bit of sense to eventually go into URL Lib dot request dot request that's a function and in that function has function parameters and they define a default value for headers why not just go in there edit the URL Lib function there and make this your default header just a thought for some of you guys anyway moving right along so now we've defined what the request is now we're gonna say response equals URL Lib dot request URL open and you are we want to open basically is request with the following thing as our headers so that's our response now we're going to say RESP data equals rescue need and now the amount of data here is actually kind of big right because it's a whole search result page and all the HTML that goes with it it's very big and bulky and so if we run this right now and we were just to print it out to console it would lag the console fairly well so we don't want to do that so instead what we'll do is we're going to call we're going to say save file equals open and we're going to call this file with headers dot text we're going to open it with the intention to write we want to say save file dot write and then we're going to write we have to write the string version of rest data because right now the response data isn't in string format so that's also kind of new ish if you're coming from Python to seven and then of course we need to do save file dot close now the other thing we have not done is we did a try and we have no accept yet so we're going to say accept exception as E and then we're going to go ahead and print string e just in case we throw an exception hopefully we don't if unless I've screwed up or something we shouldn't then we're going to come over here this is where that file will go is it'll just go right over here and we should be ready to run this so let's go ahead and save and run that and the first one will throw I can't remember okay so the first one throws the forbidden yeah so we're still trying this but the second one worked because we didn't see a four-bit end we come over here here it is with headers we can open it with notepad plus plus and here's all of our data so obviously it's a bunch of junk you know but this is all search results there were some images there eventually we could maybe get to some some sort of text or something but anyway it's it this is a huge mess Google results are pretty messy but anyway we were able to get by Google's a little filter for for just anybody right so but if you if anybody had just read the documents you'd find out how easy it is to change your headers but a lot of people don't read documents so I guess that's that's why so that's going to conclude the basics of URL Lib now again the data where we're fed back is just this huge mess of data like what do we even do with all this data so then you have to kind of parse through the data so the next thing we're going to need to learn is regular expressions to actually parse through this data and our regular expressions are kind of scary to people sometimes mostly because it's its own programming language entirely so everything you know about Python up till now doesn't mean anything for when it comes to regular expressions but luckily regular expressions being their own programming language basically are transferable pretty much anywhere you go the rules of regular expressions will remain so once you understand the logic of regular expressions you can take it to any language it's a lot like SQL right if you learn SQL or as the cool kids say it's sequel it's it's its own programming language and you can take it anywhere to any other programming language and work with sequel or SQL whatever you want to call it so anyway getting a little ahead of ourselves but you must want to say that we're going to be covering regular expressions very soon and then after we cover it will mesh regular expressions with URL Lib so a lot like your basic programs are just a combination of very complex programs rather are just a combination of very basic tools even some of these really complex tasks are a lot of times just the combination or really basic modules and tools that you already have maybe not if statements and all that but you know you are lid plus regular expressions equals a pretty darn good website parser already you could also use something like beautiful soup but if you look into beautiful soup most of what beautiful soup is is URL Lib and regular expressions so anyway that's going to conclude this video if you guys have any questions or comments about you are a lip please feel free to leave them below if you guys have any requests about more information on URL live or some of the other built inside packages or even third-party modules that you want me to cover film in the series right now so if you happen to do it you know fairly recently to when this video is posted I'll probably able to include it in this series so anyway that's it as always thanks for watching thanks for all the support and subscriptions and until next time
Info
Channel: sentdex
Views: 269,050
Rating: 4.9358072 out of 5
Keywords: Python (Programming Language), Computer Programming (Conference Subject), Hypertext Transfer Protocol (Internet Protocol), Tutorial (Media Genre), urllib, internet, url, website, link, visit, programming, basics, python 3, python 3.3, python 3.4, python 2 and 3, python 2.7, 2.7, 3.3, 3.4, beginner, how-to, coding, easy
Id: 5GzVNi0oTxQ
Channel Id: undefined
Length: 24min 4sec (1444 seconds)
Published: Sat Jul 19 2014
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.