Sumana Harihareswara - HTTP Can Do That?! - PyCon 2016

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[applause] (host) Last up in this session today, we have Sumana Harihareswara, and she's going to be talking about: HTTP Can Do That?! [applause] (Sumana Harihareswara) Hello. Thank you very much for coming to something that is basically in your lunch slot. Thank you for being hungry for knowledge! [applause] So, this is: HTTP Can Do That?! A collection of bad ideas, by me, @brainwane of Changeset Consulting. That is my company. And I'd also like to thank the CART provider. Thank you. [applause] For the live captioning over to your left. So let's start with HTTP. I always, when I'm at the beginning of one of these talks and there is any kind of abbreviation like this, I have this pedantic need to know exactly what it stands for, even if I already knew, I really want the speaker to say what it -- it fulfills some itch in my brain. So, I'm fulfilling this itch now. HTTP stands for the hypertext transfer protocol. You've probably seen it at the beginning of most URLs, uniform resource locators. And it is -- oh this is great. I love when there are people laughing at my little jokes. Thank you so much for that. It makes me -- I also do a bit of standup comedy, and in standup, you're supposed to get at least 3 and up to 6 laughs in a minute, and I've clocked up to 5.2. That will not be happening in this talk, but I aim for at least one. So if 60 seconds have gone by, there will be a punchline coming, I hope. HTTP is defined in an Internet engineering task force (IETF) request for comment (or RFC), which you can read. And this is, for instance, a bit of RFC 7230, concerning message syntax and routing, where you can see that the user agent sends a request to a server and gets a response. You see there's ASCII art, so it can't be that intimidating. I'm going to assume that everyone in this room has actually written code that interacts with the Internet, and especially with the web, and especially with HTTP. But just a crash course to sort of remind you of what's going on when you type GET or POST or whatever, there is a client on the left. It may look like a deformed green human head, but it's supposed to be, you know, an API client, or an app or a browser or something, and on the right is a sort of ominous cubical -- sphere -- no, the other thing -- prism, a server. Anyway, so the client sends a request, which is a kind of HTTP message to the server asking for something or asking to do something, having to do with the state of the resources that are available on the server, and then the server sends a response back to the client. And you may be asking me, "Hold on, what are requests and responses?" Let me tell you! I like the imaginary "you" in my head that asks me questions that the next slide answers. I hope that imaginary "you" corresponds with many of you, and I'm sorry for any lack of congruence there. An HTTP message, a request or response, is text. It is text. It is text you can write, it is text you can read, texty text text text text. I'm kind of curious how the CART provider says... Anyway. Um... I'm just going to apologize right now to the captioner. I gave you the slides ahead of time but I sure do say a lot of weird things that aren't in the slides, don't I? (captioner) You got that right. [applause] (Sumana Harihareswara) For those of you who did not look at the captions, the captioner said "Thank you" and "You've got that right." This has gotten quite recursive and meta. [laughter] An HTTP message, a request and response, is divided in three parts, much like Gaul, for those of you who studied Latin. And the start line is the first part, and some things that might be in it: the HTTP version. In this talk, I'll be talking about version 1.1. The previous session in this very selfsame room talked about HTTP/2, which is future and cool and Blade Runner. I won't be talking about that. I'm talking about the old HTTP that's implemented everywhere, and so you can take stuff from this collection of bad ideas and implement them right now today. And then, within the start line, you also have, for instance, if you're requesting something, a method or verb, and a response status code, and I'll be going into all of those more later. Headers might include what kind of content it is or the length, or things like that -- metadata, right? That stuff Snowden talked about. And there's a sometimes optional body or payload. In this case it's a picture of a chandelier. I think in general, more message bodies should be pretty pictures of chandeliers and not, like, I don't know, abusive, harassing blog comments. Just an idea. So here's an example request. The start line: GET (the verb, what we want to do), slash (this is an address), the version of HTTP (1.1). Headers: here are just some of the headers, so just some of the headers that you'll see. Things like the host, what kind of content we're willing to accept, User-Agent, which I'll talk about more later. In this case the body is blank. There is no payload because you don't need one. What is this request saying? This request is saying, "Please give me the root that's at sumana.biz. "Give me the webpage that is the index page of this website." Here is the response that sumana.biz might send back. Incidentally, I think sumana.biz exists and is like a spam site in Thailand or something. The start line tells you the version of HTTP, and "200 OK," a status code and reason phrase. I'll talk about those more later. Here are some of the headers that you might get back that tell you how long it is, that it's HTML, the date, and last modified, and then the body, which is an HTML page saying that Sumanaville is a pretty rockin' site. You can't see it, it's just on my local host, so, sorry. Let's start by talking about methods, verbs. Here are the really popular request methods or verbs: GET ("gimme") and POST ("here you go"). These are really well understood among web application developers in a lot of ways. This is the set of verbs you get if you are doing stuff in HTML web form, so if you started doing web application development with forms in HTML, you know, in, like, Netscape in 1995, this is what you had; these are the Dave Matthews Band of verbs. They're very, very popular, but, you know, maybe you could go a little further and, like, expand your horizons. I sometimes worry about that joke because I'm sure they're a perfectly nice band, and in fact, the Dave Matthews Band is the reason that the GNU Mailman project exists. The Mailman mailing list software was originally developed to manage the fan clubs for the Dave Matthews Band. I know, isn't that amazing? Also, Mailman is having a sprint during the PyCon -- I need to stop advertising other things and do my talk. OK. [laughter] Here's the first bad idea of this session. You can create an API that allows POST but not GET. [laughter] There's more at this particular GitLab repo, "HTTP can do that," but this is in Python 2, if you subclass some bits of the standard library, base HTTP request handler out of base HTTP server, then you really have -- it's very DIY. You have to implement a method for -- that is to say, a function within this class for every verb, every HTTP verb you want to support. So you don't have to implement do do_GET, and therefore, if someone does a GET, then they would just be like, "I have no idea how to deal with that." Here are some use cases: letters to Santa Claus. [laughter] Employee suggestion box. [laughter] Extremely moderated blog comments. [laughter] And let me just start now here and put in a logistical note. You may remember that I said this was a collection of bad ideas, but sometimes I will actually be mentioning good ideas in this talk, which may blow your mind. How will you tell the difference? This is what a good idea feels like to me. This is a snapshot of the Recurse Center, an educational institution in New York City, and there's plants and whiteboards and natural light, and thoughts written in dry-erase marker that could be elegant and performant and inspiring and enlightening. And here's Horror World. This is what a bad idea feels like to me. You shouldn't do it. It's a red arrow pointing to the left. By the way, across the top there's a transliteration of the phrase "horror world" into Kannada, a language my parents speak. That's not how you say "horror world" in Kannada. That's how you say "hor-ur wurld" in Kannada. Sorry, captioner. [laughter] (captioner) Sure... (Sumana Harihareswara) This is the bad idea scale. [laughter in response to captioner's remark] [applause] It goes from "Horror World" to "natural light." And so giving the client no way to GET is a terrible idea, and you should not do it because if nothing else, you'll want Santa Claus to be able to have, like, a web dashboard to look at the letters, and that will require GET. Let's think about the things you might want to do with data: create, read, update, and delete. Well, if you think about what verbs you can use in HTTP to do these various things, the really common ones, GET and POST, you would read with GET, and then create and update and delete with POST. You're using the same word to do three different things. I find that inelegant. I did sort of a stamp there as like, "Sumana disapproves this message." Or, set of messages. Anyway... Underappreciated methods, let's talk about some of those. Outside the Dave Matthews Band into, like, the Mountain Goats and, like, Neko Case of verbs. DELETE. DELETE is to delete a resource. That's what it -- it does what it says on the tin. So if you were to implement DELETE, then, you know, maybe I would start up a bit of server, serving at port 8000, the official port of localhost. And in a different terminal window, I open up an interactive Python session. I import the requests library, which is great. Let's have a hand for requests. [applause] Requests makes it extremely easy for you to send HTTP requests of various kinds and then inspect the responses. So I GET localhost:8000, and in the terminal window where I was running serverwdelete.py, you can see the log saying that that GET request came through with a code "200 OK" being part of the response. So, just scrolling that down so you can see more stuff. So here's the code for serverwdelete.py, where there's a do_DELETE function, where if that file gets deleted, I just call os.remove. There's no authentication or authorization of any kind here. Do not implement this in production. Despite the fact that there is an OSI-approved license on this code, I really don't want you to ever use this for anything other than pranks and self-edification, please. The only bit of safekeeping here is that the only file path that can be deleted here is FileToDelete.txt. So, in another terminal, I say, "Hey, ls FileToDelete." Yep, it's there. And then in that Python session, I use the requests library and I delete that path, and the log says, "Yep, DELETE /FileToDelete, 204." 204 means there's now a successful deletion that's happened. And now I do that ls again. "What? What file? Never heard of that file." So it totally worked. Within the requests library, you can, as I mentioned, look at and dissect the response that you got. So, status_code: 204, reason: no content, and that is the proper response and reason phrase to give when a deletion has successfully happened. Is this a good idea? Kind of and maybe, right? Don't just implement DELETE willy-nilly, and then be surprised when a random cracker deletes lots of files that you cared about. But within a bit of a walled garden where there's a lot of micro-services and programmatic creation and reading and updating of resources, maybe with proper auth you want to implement DELETE so that it's also possible to programmatically delete resources. Here's another unappreciated method. It's PUT, AKA "here you go." "But wait," you might ask, imaginary listener, who might also be the people in this audience, "I thought you said POST meant "here you go." Well, that leads us to the question: "Well, Socrates, what is the POST?" "Well, Glaucon, I'm glad you asked." The standard, right, the RFC says that POST basically means "This is above our pay grade; take it to the boss," also known as overloaded POST. And the idea is, there is a web application, perhaps written in Python, that contains the logic of what to do with a POST that contains these characteristics at this path, and so on. Often our convention seems to be, our idiom seems to be that we use POST as POST-to-append, meaning "create a new item in this set." Let me show you an example. Here, the sample payload or body is pictures from a museum that I went to at a textile museum in England. These are cards in a Jacquard loom that are punch cards that are the forebears of the punch cards that we ended up using for mid-20th-century computing. So, PUT versus POST. These are identical requests except for the verb. On the left, PUT/cards/5. But this body means "this picture should go at that address." It is unambiguous about changing the state on the server of the resource. POST means -- tells the web app that this picture applies to /cards/5 somehow; you figure it out. This gives you more flexibility, but it also makes things more ambiguous for your and other maintainers of this code's future debugging. So you could use PUT for "create an update," or some combination of PUT and POST, and you can use GET for "read" and DELETE for "delete." And that would make things a little bit more granular. And PUT is a very good idea. You'll see it used in APIs and API client libraries right now in lots of places. So I suggest that you investigate PUT. Another unused method is PATCH. That's "update just part of this document or resource." It's a good idea. It takes a little bit more learning on how to do it but you can absolutely use it. OPTIONS: ask what verbs the client's allowed to use for a specific path or server-wide. This is especially useful when you're exploring how to use an API that's new to you. So, OPTIONS. Very good idea. Try it out. Here's a super cool method. If you walk away with nothing else but this from this from this talk, I will be quite chuffed, quite pleased. HEAD is like GET, but just for metadata. So here is a comparison. So on the left is a GET request and on the right is a HEAD request. And they're exactly the same, except that the respon-- of course, the verb and the response for the second one, you don't get the body, the payload. You just get what the start line and headers would have been for a GET request of this type. And this is going to save you a bunch of time, right? This is going to save the server time in creating and sending you the body, and it's going to save you network bandwidth time. So here's an example. I start up IPython, right? I import requests. I get a URI. This is in this case some picture off of Wikimedia commons, right, some JPEG. And then I use the %timeit command within IPython, which tries something three times and then tells you what the best time was. This is not proper data science, right? This is an anecdote. This is artisanal data. But it's still useful because you can see that when you GET that, it took 1.33 seconds, but when you HEAD, it took only 163 milliseconds, and that can really add up. Think about the times when you're writing a web application and you actually -- or you're calling some API and you don't need the whole body, because all you need to know is, does this resource exist at this path? Do I have permission to get it? How long is it? When was it last modified? And so on and so on. You can use this, for instance, when creating a table of contents or something like that. And it will save you a bunch of time and it will save a bunch of work for the server that you're calling. So HEAD is a fantastic idea. It's implemented in lots of places. Play with it. See how it saves you time. Let's talk about headers. So, some popular headers include Content-Type ("what kind of thing is this?") and Content-Length. So, Content-Type is also known as MIME, or Mime, and since it's also known as Mime with lower case, I'm not going to tell you what it stands for. Interchange, blah, blah. Some of the content types you might have seen, they're sort of arrayed in these hierarchical -- you know, there are these sort of top level name spaces, and then things below that. So there's the text, text/HTML, text/plain. Application/json or /xml, Chemical. Seriously. There is a top level chemical name space. Hats off to you, chemists. Sometimes you see these call-and-response pairs. A request says, "Here's what I'm willing to accept," and then the server says, "Here's what I'm willing to give you." Accept-, Content-Encoding, Accept-, Content-Language. If-Match, If-None-Match, ETag. So the ETag is sort of a versioning system for individual resources, so that there is a particular number, kind of like a git log hash, that changes when the document changes so that the requester can say, "Well, if this matches such and such, then give it to me, otherwise not." So this is a way to save time. Similarly, the Cache-Control header, and If-Modified-Since, If-Unmodified-Since, and Last-Modified are ways of saying "Hey, update this. "Here's a post to this resource, "but only if I'm pretty sure that the resource "that you currently have is -- here's the last time "that it was modified." So it's a way to avoid conflicts. The Cache-Control header is interesting because it lets you be more granular. By the way, you technically could do conditional GETs where you say, "Only give this to me "if it's in a cache on the server side." Who here is a vegan? A few vegans. Right, so you could sort of, like, be a data vegan where you live lightly upon the Earth by only grabbing things that are in caches and thus not causing new work for the server. [laughter] Tell me how this goes. Here's a popular header, User-Agent, a way of saying, for instance, "I am such-and-such a client library," "I am such-and-such an app," "I am such-and-such a browser." Here's an unpopular header, From: the email address of the person making the request. This is actually really in the spec. I am not lying to you. I'm sure someone would, like, shout at me if I was lying. This is that kind of, you know, integrity-based audience. Otherwise known as hecklers. This is seriously -- I think it's basically a holdover from when you were a sysadmin and you got, like, ten hits a day. So, you know, you would look at it in your logs and you would say, "Oh no, "my PERL 0.1 server crashed earlier and Allison wanted that one page. "I should email Allison and tell her the site is back up." You could use this for really, really bad auth: "Yes, you're only allowed to get this resource "if you got to type one of the email addresses "in a request you control." [laughter] You could use it as a guest book, like, "Yes, I saw your site launch." [laughter] Coded messages meant for the network surveillor. [laughter] I bet you so much that in the logs of every website you've ever run, no one has ever legitimately used From. So you probably aren't even capturing it in your logs, but the NSA is. So, you know, you could have "From MiddleFinger@Snowden.org." [laughter] These are bad ideas. Don't do them. This is a direct route to Horror World. Here's another spy trick. Header fields are case-insensitive. So you could vary the case of the headers you send and have, you know, an encoding in almost a steganographic way. This is a terrible idea. No matter what your threat model is -- I'm just imagining this diagram where there's a bunch of different boxes and they all just point to "They will figure it out." All right, here's a popular header: Host. And in fact it's a required header. It is required in request messages. And I'm going to show you here something that is not Python but that is useful anyway, as so many non-Python things turn out to be. And it's Netcat, which is built into basically most GNU Linux systems that you're going to have in a lot of developmental environments. It's a way to directly write, just in your terminal with your keyboard, HTTP messages. So you invoke Netcat and tell it what domain it's going to be pointed at and what port -- in this case, myhostname.tld and 80. And here's the start line, GET /bicycle (just a sample path), HTTP/1.1 (the version), Host: myhostname.tld. That's the header bit, right? So here's Host and path working together. This is a web page; it's my local bookstore. At the top is the URL bar. Astoriabookshop.com/event/storytime-73. Isn't it nice they've had so many story times? And then below that, I've opened up the developer toolbar and the request headers say GET /event/storytime-73 as part of the start line, Host: www.astoriabookshop.com. So if you concatenate those together, you get the entire URL. Similarly, here's a BBC news page: "Do butterflies hold the answer to life's mysteries?" I have cruelly covered up the answer with the developer toolbar. [laughter] I'm denying you both lunch and butterfly mystery solutions. Truly, I am a terrible speaker. So here is the URL bar, bbc.com/news/magazine-[number]. And then you see again the request headers, the start line, GET, and the stuff that's coming off the route, and the host is www.bbc.com. Hold on, why do we need to repeat this if it's in the URL? This is inelegant, isn't it? "INELEGANT!" The stamp. Look at this. Netcat has been told now twice, both in the invocation and in the Host header, that it's myhostname.tld. Surely, every other person who's ever thought about the internet has not made this redundancy without thinking about it. You are correct. There is a reason. HTTP is separate from the domain name system. The first time when we're invoking Netcat and saying myhostname.tld, that's telling Netcat and telling the operating system, "Hey, you're going to be using DNS to look up the IP address "that is correlated with this particular domain name." And so therefore, if you're in a situation where you're just dealing with bare IP addresses that you're sending stuff to, then the messages will still work, and possibly, you know, more saliently to you, Host helps route requests among different domains that sit on the same server, otherwise known as subdomains or virtual hosts. So here are some examples. Here's www.debian.org, the main website. Here's bugs.debian.org, the bug tracker; lists.debian.org, the mailing list; wiki, the Wiki, the knowledge base; and so all of these come off the root, and so you would say, "Host wiki," or "Host bugs," and so on. Here's something interesting you can do. In requests, you can tell it arbitrary headers to put into a request, using the requests Python Library. So here I requested debian.org and I mentioned that the host I wanted was "sumanarific." There is no sumanarific.debian.org, I believe. Maybe there ought to be. And here's the text of the response; it basically says, "Hey, welcome to mirror-csail!" It's kind of an "I just installed this server" page, so probably, you know, you should watch out for that sort of thing. Here's a spam story. I installed Drupal once. We all make our youthful sins. It was fine, it was fine. Drupal is a perfectly fine project. OK, so the next day, I went to my 404 logs and I saw that there was "page not found" for a website that wasn't mine. That seemed odd. That was contrary to my understanding of how the web works. I have anonymized this. I think they're a little bit more subtle these days than making their websites "myphishingsite.biz." And I thought, "What in the world?" And I looked in the logs and I saw that the log said that the GET request was for http://myphishingsite.biz. And the thing is, you might think, "Oh, well maybe that's just a legitimate error "of someone who started typing in the URL bar "and forgot to take out the other URL that was already there." But if that happened, it would start with a slash. Everything here is going to come off of the root. All legitimate requests are going to start with a slash. So then I started playing around with -- "What in the world is going on here?" You know, I thought I was going to spend that week, you know, customizing my Drupal install, and instead I entered, like, this murky world of spam and web malformation. And I was able to intentionally malform my request to match what I saw in my logs. So if you use Netcat, right, OK, you tell Netcat, "Invoke myhostname.tld," but then the GET request goes to http://spam.com and Host spam.com, just to make sure that spam.com shows up a few different places in these 404 logs, or if you want it to look a little bit more legit, then Host spam.com, but GET something that looks like it might have come off of the root and you just were looking for a resource that wasn't there, like /viagra-bitcoin. I would be really surprised if there were any resource anywhere on the web called viagra-bitcon, but "there are more things in heaven and earth "than are dreamt of in my philosophy." This is a terrible idea. If you learn of an interesting trick from spammers, probably you shouldn't do it. That's just a Sumana's Life Lesson. You can define your own header. There's a set that is defined in these RFCs, but header fields are fully extensible. There is no limit on the introduction of new field names and each presumably defining new semantics not on the number of header fields -- nor, sorry, on the number of header fields used in a given message. Usually if you are coming up with your own header, whether it's a request header or a response header or both, you prepend X so that people understand that this is something that you made up. Here is an example. Wikipedia and the other Wikimedia projects have X-Wikimedia-Debug. This is an HTTP request header that their developers can use in order for a particular request to choose a particular backend or varnish, to choose caching behavior, to record a trace and profile a request, to add and look at particular debug logs, to go into read-only mode, and there's browser extensions they've created for Chrome and Firefox so that you can say, "For this particular request, "do these things in the X-Wikimedia-Debug header." I think this is super cool. There's more at this address. So here's an example from a response side instead of a request side. This is a flask app I made that remixes the names of physics articles on Wikipedia to come up with sci-fi novel titles, like "Lorentz Relation" and "Ambiguity Particle." The code for this is available. If you end up using this to name your sci-fi novel, I will be bewildered and pleased. So this is a flask app. So I added some code here. Why don't I zoom in on the specific code that changes a header. You add a new header here with X-Sumana-Is-Amazing, the value being 'Indeed, Verily So'. And then, if you look at the response to a GET request and you zoom in, you see: X-Sumana-Is-Amazing: "Indeed, Verily So." And it looks so official when it shows up in the developer toolbar. It looks like it must be true. [laughter] It makes me feel amazing to see that, actually. You should use this for days when you need better morale. The code is up, for all the good it'll do you. Is this a good idea? If you're Wikipedia, maybe. Look at the cool things they've done with that bespoke header. But be careful that you don't sort of write an alternate Internet in an unnecessary way. There's a lot that's already built into HTTP. You don't necessarily need to define your own header all the time. Don't Balkanize things. Let's talk about status codes. So, 100 and 101 are informational. That's "Oh yeah, go on." 200 series, "Success." The 300 series, "Redirecting you." 400 series, "Client error, you screwed up." 500 series, "Server error, I screwed up." So these are the sort of five -- like the five kingdoms of like, you know, animals and plants and fungi and stuff. These are the five kingdoms of response codes. And technically, status or response codes, there's two parts. There's the code, which is that three-digit int, and there's a reason phrase, which is, you know, these English words that are standard. The spec -- the RFC says that a client should ignore the reason phrase content, so you're on notice. You should just be looking at that int. Here's a few that are really cool that not enough people know about. 410 Gone. It was here, now it's not. So that means, for instance, "Hey, I recognize that you are going to the address "for that April 2012 coupon. "It was there. I recognize that fact. I am not gaslighting you. "You are right that there was something there. "However, it's gone and it's not coming back." I feel like this is just a more respectful thing to do than 404 when you know that a resource used to be there. 304 Not Modified. This is a response to that conditional GET I was talking about earlier. "You said, 'GET this if it's been modified "'since such and such a date,' and it hasn't been." So I love that there are very economical ways to say these kinds of things that actually provide a tremendous amount of semantics for any API client or for other parts of your web app. 451 Unavailable For Legal Reasons. Server is le-- this is part of the Internet, this is part of HTTP. Server is legally required to reject the client's request, whether that's GET, POST, whatever. "Can't let you see that; it's censored." So for instance, the government of India disagrees with everybody else in the world regarding the borders of India, so there are maps of India that are, you know, accurate that you're not allowed to distribute to Indian IP addresses, basically. Yeah, 451 is pretty useful. and there's the DMCA request and this and that and the other. You might ask, "Why is this a client-side error? "Why is this in the 400 block?" In "RESTful Web APIs" by Leonard Richardson and Mike Amundson, they suggest: "This is considered a client side-error "even though the request is well formed "and the legal requirement exists on the server side. "After all, that representation was censored for a reason. "There must be something wrong with you, citizen." [laughter] This is straight up part of HTTP now, because as of February 2016, this is a standard. And you can use it, and it's a good idea. And again, like 410 Gone, I think it's a respectful thing to do for your users so you don't gaslight them into thinking, "Oh, I guess that doesn't exist." No, it exists, and a government won't let us help you. I think that's a bit of consciousness-raising you can do, and it's just straight up more accurate and more precise. Now some WTF responses. All of these were found in the wild. There were real, running servers that gave these as responses to just a straight up GET on the root of a website. "Code: 126, Reason: Incorrect key file," and then a bunch of SQL suff. Don't let attackers see this! [laughter] "301, Reason: explicit_header_response_code." I think someone forgot to substitute a variable here. 403, which is Forbidden. "You've got to ask yourself one question: Do I feel lucky?: OK, that's witty. All right, you get a pass. This one, though... [laughter] "Reason: can't put wasabi in bed." I will buy a Portland area microbrew for someone who can help me understand this one. [laughter] "404, Reason: HTTP/1.1 404." That's not a reason, that's the same thing again. 404 Not Found, and then some PHP code again. Don't show this to your attackers! "200, Forbidden." No, we have another code for that. [laughter] 403 is Forbidden. Don't be that person who starts saying "goodbye" instead of "hello" because, "Like, it's all socially constructed, man." Yes, it is a social construction, and we depend on it. Do not break language. This has been a public service announcement. [laughter] "404, Apple WebObjects." Apple WebObjects may be an error, but it's not 404. "404, forbidden." Again, we have 403 for that. 434, which doesn't exist; reason: completely nonsensical. "451, unknown reason." No, we actually do know! It's the legal one. "503, Backend is unhealthy." My butt is fine, thank you. "520, Origin Error," so not a real thing. "525, Origin SSL Handshake error." I think this is basically was a cloud flare error that end users are never supposed to see. Well, whoops. 533 -- my friend Audrey Eschright said, "Is this an SAT question?" [laughter] "732, Reason --" OK, that's, like, beyond the five kingdoms I mentioned, right? Seven? And then this copyright page is the reason? "999, Request denied." It has been brought to my attention that 999 is an emergency phone number in much of the world. Still don't do this. If you want to do this: changing your reason-phrases. So here is some Python 3 code. We're later in the talk now, so we've gone from Python 2 to Python 3, as y'all should do. You import http.server, part of the standard library, you subclass some things, and you just change this dict so that responses[200] corresponds to "Oll Korrect" or "Oh Kay," which are apocryphally the words that OK is an abbreviation for. The code is online, and I go to localhost, and indeed, the status code that comes back is "200 Oll Korrect." Don't do this! Don't break language. This has been a repetition of my public service announcement. There's so much more. There really -- I could have spoken for the length of PyCon on amazing things you can do in HTTP. You can say specifically, "Don't cache this. Hey browsers, "this is a fast-moving news event. Do not cache this document." You can pass instructions directly to the server or client. There are these methods I didn't get to talk about. There's a specific response that says, "Uh, state change, "conflict, sure you want to do this overwriting business?" There's look-before-you-leap requests, where you say, "Well, do you think -- hey server, if I posted this five-gig file "to this address, would that work, would that be OK for you?" And the server can say, "Yeah, come on down," or "Uh, no." Resources at HTTPS and HTTP URLs can differ. HTTPS is not just HTTP plus TLS. There can be entirely different addressing schemes. You can rank your preferences in sort of like this ranked preference voting using a "q" in the Accept headers. Content-Disposition, that is the header that lets you say, "Treat this like an attachment." There is so much more. And studying HTTP, studying the way that it works and understanding the "why" of why it works, it gave me a feeling of power; it gave me a feeling of increased capability, for sure; but it also gave me a sense of wonder that we're standing on the shoulders of giants. If you look in Python's implementation in the standard library, there's a comment at the top that says, "Well, this is in accordance with these specs," written by people like R. Fielding, Roy Fielding, one of the people who created HTTP. And I love that we all get to learn from each other in this way and build on top of work done by others that helps us see new ways of thinking about resources, about state, about representation, about economically saying complicated things. I had a real sense of amazement. And it makes you think, if we had used more of these built-in semantics instead of sometimes reinventing the wheel and implementing stuff -- inscripting languages on top of it, what might the web have been? But also, we still have a chance; what might it be? The web is still young. What will you do? Well, you can read and play. RFCs 7230 through 7235 are the definition of HTTP 1.1. Go ahead and read them. They are actually not that intimidating. Like I said, there's ASCII art. Use the requests library in Python. Use tools that are available on the command line like Netcat, Wget, netstat, and Telnet to play with bare messages on the wire. Basic HTTP servers in your favorite language are available. Go ahead and read that code. It's actually a great way to understand what is going on when you send and receive requests and responses. And you can look at the code -- again, for personal edification and pranks only please -- from this talk on gitlab. I'd like to thank Leonard Richardson, Greg Hendershott, Zack Weinberg, The Recurse Center, Clay Hallock, Paul Tagliamonte, and Open Source Bridge for supporting me in the development of this talk. I'd also like to thank Julia Evans, Allison Kaptur, Amy Hanlon, and Katie Silverio for the example that they provided me in projects and talks that are ridiculous bad ideas and fantastic for play, exploration, and learning. Thank you. I'm ready for questions. [applause] (host) So we do have a few minutes for questions. If you're leaving the room, please tiptoe and be quiet. (Sumana Harihareswara) I'm going to thank the captioner again while people line up. I'm sorry, again, and thank you. [applause] (captioner) You say that now... [laughter] (Sumana Harihareswara) I don't know where you live, but maybe I can buy you a beer, or like an Internet beer or something. (captioner) No way. I'm heading home as soon as possible. (Sumana Harihareswara) Go ahead, please, ask your question. [laughter in response to captioner's remark] I'm sorry, please go ahead and ask your question. (audience member) I just got upstaged by the captioner. That's awesome. I was wondering if you know where the number 451 comes from. (Sumana Harihareswara) Yes, I do know where the number 451 comes from, and I think you're guessing the correct guess, which is that it's based on Ray Bradbury's classic novel Fahrenheit 451, and that is definitely a very cool thing about it. I like to sort of leave it as a bit of an Easter egg, but, like, I'm also happy to open up the Easter egg and say, "Yay, chocolate!" [laughter] (audience member) I remember seeing somewhere this idea -- (Sumana Harihareswara) I'm sorry, could you stand closer to the mic so I can hear you? (audience member) I remember reading somewhere that POST and PUT, that one of them is usually used for, like, idempotent operations and the other one not. Is that standardized anywhere, or is that just common practice? (Sumana Harihareswara) My understanding, and I'm totally willing to be told that I'm incorrect on this, is that PUT simply says, "Here is the resource to go at that address" and then, you know, POST is a little bit more ambiguous in this way. And then I feel like I would want to do a little bit more of a survey to understand what is -- what we're going to call common practice. I think that's something that is in a sense a matter for sociologists, almost. But I'm not sure. That's really my answer to you. We have a few more minutes. If the captioner wants to, like, impugn me some more, quite rightly so, you are welcome to do so. (captioner) No. You were entertaining. [laughter] (Sumana Harihareswara) Aw! Thank you. Oh, we have one more question. Go ahead. (audience member) Could you talk a little bit more about when it is or isn't a good idea to add a custom header? Like, what are some more use cases for that? (Sumana Harihareswara) Oh, I see, OK. Ideas for custom headers. So I think that one example that comes to mind, by the way -- I think it's good because my definition of use may be a little bit wider and more fluid than some other people's, but on debian.org, they have a response header so that every single response coming from www.debian.org includes an X-Clacks header which mentions Terry Pratchett, the author. Terry Pratchett wrote the Discworld novels, which include a clacks system, C-L-A-C-K-S, that is very similar to our telegraph or Internet, and it is the custom among the clacks operators that when one of them dies, they keep saying that person's name in clacks signals throughout the system, because as long as your name is being spoken, then you never truly die. So I think that's useful because I think it's really important to remember our history and people that we want to honor and value. In a somewhat more functional, production-oriented, "I need to get VC money for this" way -- unless someone is going to be offering me VC money for just remembering Terry Pratchett as a service or something. I made my session chair laugh! I'm so happy. I think that an example of a good header is one where the semantics and sort of the domain knowledge of what you're doing in a particular sort of ecology between a server and perhaps particular clients is one where you want a fine gradation of response, so that -- I mean, you can sort of do this with, like, User-Agent. You can say, "Oh, User-Agent, give a different response "to different user agents" or something like this. But if there is a particular kind of difference that you want people to be able to trigger, then I think that a bespoke header can be useful in this way. It seems like performance-y things, when there are different kinds of tradeoffs that people might want about, like, performance versus ease of use or convenience or security or something like that, and you want the client to be able to programmatically choose where on that spectrum they want the response to be and the processing time to be, that might be good. (host) And that's all the time we have. Let's thank Sumana again. (Sumana Harihareswara) Thank you. [applause]
Info
Channel: PyCon 2016
Views: 8,634
Rating: undefined out of 5
Keywords:
Id: HsLrXt2l-kg
Channel Id: undefined
Length: 46min 27sec (2787 seconds)
Published: Tue May 31 2016
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.