[applause] (host)
Last up in this session today, we have Sumana Harihareswara, and she's going to be talking
about: HTTP Can Do That?! [applause] (Sumana Harihareswara)
Hello. Thank you very much
for coming to something that is basically
in your lunch slot. Thank you for being
hungry for knowledge! [applause] So, this is: HTTP Can Do That?!
A collection of bad ideas, by me, @brainwane of Changeset
Consulting. That is my company. And I'd also like to thank
the CART provider. Thank you. [applause] For the live captioning
over to your left. So let's start with HTTP. I always, when I'm at the
beginning of one of these talks and there is any kind of
abbreviation like this, I have this pedantic need to know
exactly what it stands for, even if I already knew,
I really want the speaker to say what it -- it fulfills
some itch in my brain. So, I'm fulfilling this itch now. HTTP stands for
the hypertext transfer protocol. You've probably seen it
at the beginning of most URLs, uniform resource locators. And it is -- oh this is great. I love when there are people
laughing at my little jokes. Thank you so much for that. It makes me -- I also do
a bit of standup comedy, and in standup,
you're supposed to get at least 3 and up to 6 laughs in a minute,
and I've clocked up to 5.2. That will not be happening in this
talk, but I aim for at least one. So if 60 seconds have gone by, there
will be a punchline coming, I hope. HTTP is defined in an Internet
engineering task force (IETF) request for comment (or RFC),
which you can read. And this is, for instance,
a bit of RFC 7230, concerning message syntax
and routing, where you can see that the user agent sends a request
to a server and gets a response. You see there's ASCII art,
so it can't be that intimidating. I'm going to assume
that everyone in this room has actually written code
that interacts with the Internet, and especially with the web,
and especially with HTTP. But just a crash course to sort of
remind you of what's going on when you type GET or POST or whatever,
there is a client on the left. It may look like
a deformed green human head, but it's supposed to be,
you know, an API client, or an app or a browser
or something, and on the right is a sort of
ominous cubical -- sphere -- no, the other thing --
prism, a server. Anyway, so the client
sends a request, which is a kind of HTTP message to the server asking for something
or asking to do something, having to do with
the state of the resources that are available on the server, and then the server sends
a response back to the client. And you may be asking me, "Hold on,
what are requests and responses?" Let me tell you! I like the imaginary "you"
in my head that asks me questions
that the next slide answers. I hope that imaginary "you"
corresponds with many of you, and I'm sorry for any lack
of congruence there. An HTTP message, a request
or response, is text. It is text. It is text you can write,
it is text you can read, texty text text text text. I'm kind of curious
how the CART provider says... Anyway. Um... I'm just going to apologize
right now to the captioner. I gave you the slides ahead of time but I sure do say a lot of weird things
that aren't in the slides, don't I? (captioner)
You got that right. [applause] (Sumana Harihareswara)
For those of you who did not look at the captions, the captioner said
"Thank you" and "You've got that right." This has gotten
quite recursive and meta. [laughter] An HTTP message, a request and
response, is divided in three parts, much like Gaul,
for those of you who studied Latin. And the start line
is the first part, and some things that might be
in it: the HTTP version. In this talk, I'll be talking
about version 1.1. The previous session
in this very selfsame room talked about HTTP/2, which is
future and cool and Blade Runner. I won't be talking about that.
I'm talking about the old HTTP that's implemented everywhere,
and so you can take stuff from this collection of bad ideas
and implement them right now today. And then, within the start line,
you also have, for instance, if you're requesting something,
a method or verb, and a response status code, and I'll be going into
all of those more later. Headers might include
what kind of content it is or the length, or things
like that -- metadata, right? That stuff Snowden talked about. And there's a sometimes optional
body or payload. In this case it's a picture
of a chandelier. I think in general,
more message bodies should be pretty pictures
of chandeliers and not, like, I don't know, abusive, harassing
blog comments. Just an idea. So here's an example request. The start line: GET
(the verb, what we want to do), slash (this is an address),
the version of HTTP (1.1). Headers: here are just
some of the headers, so just some of the headers
that you'll see. Things like the host, what kind of content
we're willing to accept, User-Agent,
which I'll talk about more later. In this case the body is blank. There is no payload
because you don't need one. What is this request saying?
This request is saying, "Please give me the root
that's at sumana.biz. "Give me the webpage that is
the index page of this website." Here is the response
that sumana.biz might send back. Incidentally,
I think sumana.biz exists and is like a spam site
in Thailand or something. The start line tells you
the version of HTTP, and "200 OK," a status code
and reason phrase. I'll talk about those more later.
Here are some of the headers that you might get back
that tell you how long it is, that it's HTML, the date,
and last modified, and then the body,
which is an HTML page saying that Sumanaville
is a pretty rockin' site. You can't see it, it's just
on my local host, so, sorry. Let's start by talking about
methods, verbs. Here are the really popular
request methods or verbs: GET ("gimme")
and POST ("here you go"). These are really well understood among web application developers
in a lot of ways. This is the set of verbs you get if you are doing stuff
in HTML web form, so if you started doing
web application development with forms in HTML, you know,
in, like, Netscape in 1995, this is what you had; these are
the Dave Matthews Band of verbs. They're very, very popular,
but, you know, maybe you could go a little further
and, like, expand your horizons. I sometimes worry about that joke because I'm sure
they're a perfectly nice band, and in fact,
the Dave Matthews Band is the reason that the
GNU Mailman project exists. The Mailman mailing list software
was originally developed to manage the fan clubs
for the Dave Matthews Band. I know, isn't that amazing? Also, Mailman is having
a sprint during the PyCon -- I need to stop advertising
other things and do my talk. OK. [laughter] Here's the first bad idea
of this session. You can create an API
that allows POST but not GET. [laughter] There's more at this particular
GitLab repo, "HTTP can do that," but this is in Python 2, if you subclass some bits
of the standard library, base HTTP request handler
out of base HTTP server, then you really have --
it's very DIY. You have to implement
a method for -- that is to say, a function within this class
for every verb, every HTTP verb
you want to support. So you don't have to
implement do do_GET, and therefore,
if someone does a GET, then they would just be like,
"I have no idea how to deal with that." Here are some use cases:
letters to Santa Claus. [laughter] Employee suggestion box. [laughter] Extremely moderated blog comments. [laughter] And let me just start now here
and put in a logistical note. You may remember that I said
this was a collection of bad ideas, but sometimes I will actually be
mentioning good ideas in this talk, which may blow your mind.
How will you tell the difference? This is what a good idea
feels like to me. This is a snapshot
of the Recurse Center, an educational institution
in New York City, and there's plants and whiteboards
and natural light, and thoughts written
in dry-erase marker that could be elegant and performant
and inspiring and enlightening. And here's Horror World.
This is what a bad idea feels like to me.
You shouldn't do it. It's a red arrow
pointing to the left. By the way, across the top
there's a transliteration of the phrase "horror world" into
Kannada, a language my parents speak. That's not how you say
"horror world" in Kannada. That's how you say "hor-ur wurld"
in Kannada. Sorry, captioner. [laughter] (captioner)
Sure... (Sumana Harihareswara)
This is the bad idea scale. [laughter in response
to captioner's remark] [applause] It goes from "Horror World"
to "natural light." And so giving the client no way
to GET is a terrible idea, and you should not do it
because if nothing else, you'll want Santa Claus
to be able to have, like, a web dashboard to look at the
letters, and that will require GET. Let's think about the things
you might want to do with data: create, read, update, and delete. Well, if you think about
what verbs you can use in HTTP to do these various things,
the really common ones, GET and POST,
you would read with GET, and then create and update
and delete with POST. You're using the same word
to do three different things. I find that inelegant. I did sort of a stamp there as like,
"Sumana disapproves this message." Or, set of messages. Anyway... Underappreciated methods,
let's talk about some of those. Outside the Dave Matthews Band
into, like, the Mountain Goats and, like, Neko Case of verbs. DELETE. DELETE is to delete a resource. That's what it --
it does what it says on the tin. So if you were to implement
DELETE, then, you know, maybe I would start up a bit of
server, serving at port 8000, the official port of localhost. And in a different terminal window, I open up an interactive
Python session. I import the requests library,
which is great. Let's have a hand for requests. [applause] Requests makes it extremely easy
for you to send HTTP requests of various kinds
and then inspect the responses. So I GET localhost:8000, and in the terminal window where
I was running serverwdelete.py, you can see the log saying
that that GET request came through with a code "200 OK"
being part of the response. So, just scrolling that down
so you can see more stuff. So here's the code
for serverwdelete.py, where there's a do_DELETE function, where if that file gets deleted,
I just call os.remove. There's no authentication
or authorization of any kind here. Do not implement this
in production. Despite the fact that there is
an OSI-approved license on this code, I really don't want you
to ever use this for anything other than pranks
and self-edification, please. The only bit of safekeeping here
is that the only file path that can be deleted here
is FileToDelete.txt. So, in another terminal, I say, "Hey, ls FileToDelete."
Yep, it's there. And then in that Python session,
I use the requests library and I delete that path, and the log says, "Yep, DELETE /FileToDelete, 204." 204 means there's now a successful
deletion that's happened. And now I do that ls again. "What? What file?
Never heard of that file." So it totally worked. Within the requests library,
you can, as I mentioned, look at and dissect
the response that you got. So, status_code: 204,
reason: no content, and that is the proper response
and reason phrase to give when a deletion
has successfully happened. Is this a good idea?
Kind of and maybe, right? Don't just implement DELETE
willy-nilly, and then be surprised when a random cracker deletes
lots of files that you cared about. But within a bit of a walled garden
where there's a lot of micro-services and programmatic creation
and reading and updating of resources, maybe with proper auth
you want to implement DELETE so that it's also possible
to programmatically delete resources. Here's another
unappreciated method. It's PUT, AKA "here you go." "But wait," you might ask,
imaginary listener, who might also be
the people in this audience, "I thought you said
POST meant "here you go." Well, that leads us
to the question: "Well, Socrates, what is the POST?" "Well, Glaucon, I'm glad you asked." The standard, right, the RFC says
that POST basically means "This is above our pay grade;
take it to the boss," also known as overloaded POST. And the idea is, there is a web application,
perhaps written in Python, that contains the logic
of what to do with a POST that contains these characteristics
at this path, and so on. Often our convention seems to be,
our idiom seems to be that we use POST
as POST-to-append, meaning "create a new item
in this set." Let me show you an example. Here, the sample payload or body
is pictures from a museum that I went to
at a textile museum in England. These are cards in a Jacquard loom
that are punch cards that are the forebears of the
punch cards that we ended up using for mid-20th-century computing. So, PUT versus POST. These are identical requests
except for the verb. On the left, PUT/cards/5. But this body means "this picture
should go at that address." It is unambiguous
about changing the state on the server of the resource. POST means -- tells the web app
that this picture applies to /cards/5 somehow;
you figure it out. This gives you more flexibility, but
it also makes things more ambiguous for your and other maintainers
of this code's future debugging. So you could use PUT
for "create an update," or some combination
of PUT and POST, and you can use GET for "read"
and DELETE for "delete." And that would make things
a little bit more granular. And PUT is a very good idea.
You'll see it used in APIs and API client libraries
right now in lots of places. So I suggest
that you investigate PUT. Another unused method is PATCH. That's "update just part
of this document or resource." It's a good idea.
It takes a little bit more learning on how to do it
but you can absolutely use it. OPTIONS: ask what verbs
the client's allowed to use for a specific path
or server-wide. This is especially useful
when you're exploring how to use an API
that's new to you. So, OPTIONS.
Very good idea. Try it out. Here's a super cool method. If you walk away with nothing else
but this from this from this talk, I will be quite chuffed,
quite pleased. HEAD is like GET,
but just for metadata. So here is a comparison. So on the left is a GET request
and on the right is a HEAD request. And they're exactly the same,
except that the respon-- of course, the verb and the
response for the second one, you don't get the body,
the payload. You just get what the start line
and headers would have been for a GET request of this type. And this is going to save you
a bunch of time, right? This is going to save
the server time in creating
and sending you the body, and it's going to save you
network bandwidth time. So here's an example.
I start up IPython, right? I import requests. I get a URI.
This is in this case some picture off of Wikimedia commons,
right, some JPEG. And then I use the %timeit
command within IPython, which tries something three times
and then tells you what the best time was. This is not
proper data science, right? This is an anecdote.
This is artisanal data. But it's still useful
because you can see that when you GET that,
it took 1.33 seconds, but when you HEAD,
it took only 163 milliseconds, and that can really add up. Think about the times when you're
writing a web application and you actually --
or you're calling some API and you don't need the whole body,
because all you need to know is, does this resource exist
at this path? Do I have permission to get it?
How long is it? When was it last modified?
And so on and so on. You can use this, for instance, when creating a table of contents
or something like that. And it will save you a bunch of time
and it will save a bunch of work for the server that you're calling.
So HEAD is a fantastic idea. It's implemented
in lots of places. Play with it.
See how it saves you time. Let's talk about headers. So, some popular headers
include Content-Type ("what kind of thing is this?")
and Content-Length. So, Content-Type
is also known as MIME, or Mime, and since it's also known
as Mime with lower case, I'm not going to tell you
what it stands for. Interchange, blah, blah. Some of the content types
you might have seen, they're sort of arrayed
in these hierarchical -- you know, there are these sort of
top level name spaces, and then things below that.
So there's the text, text/HTML, text/plain. Application/json or /xml, Chemical. Seriously. There is a top level
chemical name space. Hats off to you, chemists. Sometimes you see
these call-and-response pairs. A request says,
"Here's what I'm willing to accept," and then the server says,
"Here's what I'm willing to give you." Accept-, Content-Encoding,
Accept-, Content-Language. If-Match, If-None-Match, ETag. So the ETag is sort of a versioning
system for individual resources, so that there is a particular number,
kind of like a git log hash, that changes
when the document changes so that the requester can say,
"Well, if this matches such and such, then give it to me, otherwise not."
So this is a way to save time. Similarly, the Cache-Control header,
and If-Modified-Since, If-Unmodified-Since,
and Last-Modified are ways of saying
"Hey, update this. "Here's a post to this resource, "but only if I'm pretty sure
that the resource "that you currently have is --
here's the last time "that it was modified."
So it's a way to avoid conflicts. The Cache-Control header is interesting
because it lets you be more granular. By the way, you technically
could do conditional GETs where you say,
"Only give this to me "if it's in a cache
on the server side." Who here is a vegan? A few vegans. Right, so you could
sort of, like, be a data vegan where you live lightly
upon the Earth by only grabbing things
that are in caches and thus not causing
new work for the server. [laughter] Tell me how this goes. Here's a popular header, User-Agent, a way of saying, for instance,
"I am such-and-such a client library," "I am such-and-such an app,"
"I am such-and-such a browser." Here's an unpopular header, From: the email address
of the person making the request. This is actually really in the spec.
I am not lying to you. I'm sure someone would, like,
shout at me if I was lying. This is that kind of, you know,
integrity-based audience. Otherwise known as hecklers. This is seriously --
I think it's basically a holdover from when you were a sysadmin
and you got, like, ten hits a day. So, you know, you would look at it
in your logs and you would say, "Oh no, "my PERL 0.1 server crashed earlier
and Allison wanted that one page. "I should email Allison
and tell her the site is back up." You could use this
for really, really bad auth: "Yes, you're only allowed
to get this resource "if you got to type
one of the email addresses "in a request you control." [laughter] You could use it as a guest book, like, "Yes, I saw your site launch." [laughter] Coded messages meant for
the network surveillor. [laughter] I bet you so much that in the logs
of every website you've ever run, no one has ever
legitimately used From. So you probably aren't even
capturing it in your logs, but the NSA is. So, you know, you could have
"From MiddleFinger@Snowden.org." [laughter] These are bad ideas. Don't do them. This is a direct route
to Horror World. Here's another spy trick. Header fields
are case-insensitive. So you could vary the case
of the headers you send and have, you know, an encoding
in almost a steganographic way. This is a terrible idea. No matter what your threat model
is -- I'm just imagining this diagram where there's
a bunch of different boxes and they all just point to
"They will figure it out." All right,
here's a popular header: Host. And in fact it's a required header.
It is required in request messages. And I'm going to show you here
something that is not Python but that is useful anyway, as so many
non-Python things turn out to be. And it's Netcat, which is built into
basically most GNU Linux systems that you're going to have
in a lot of developmental environments. It's a way to directly write,
just in your terminal with your keyboard, HTTP messages. So you invoke Netcat and tell it
what domain it's going to be pointed at and what port -- in this case,
myhostname.tld and 80. And here's the start line,
GET /bicycle (just a sample path), HTTP/1.1 (the version),
Host: myhostname.tld. That's the header bit, right? So here's Host and path
working together. This is a web page;
it's my local bookstore. At the top is the URL bar. Astoriabookshop.com/event/storytime-73. Isn't it nice
they've had so many story times? And then below that,
I've opened up the developer toolbar and the request headers say
GET /event/storytime-73 as part of the start line,
Host: www.astoriabookshop.com. So if you concatenate those
together, you get the entire URL. Similarly, here's a BBC news page: "Do butterflies hold the answer
to life's mysteries?" I have cruelly covered up the answer
with the developer toolbar. [laughter] I'm denying you both lunch
and butterfly mystery solutions. Truly, I am a terrible speaker. So here is the URL bar,
bbc.com/news/magazine-[number]. And then you see again
the request headers, the start line, GET, and the stuff
that's coming off the route, and the host is www.bbc.com. Hold on, why do we need
to repeat this if it's in the URL? This is inelegant, isn't it?
"INELEGANT!" The stamp. Look at this.
Netcat has been told now twice, both in the invocation
and in the Host header, that it's myhostname.tld. Surely, every other person
who's ever thought about the internet has not made this redundancy
without thinking about it. You are correct. There is a reason. HTTP is separate
from the domain name system. The first time
when we're invoking Netcat and saying myhostname.tld, that's telling Netcat
and telling the operating system, "Hey, you're going to be using DNS
to look up the IP address "that is correlated
with this particular domain name." And so therefore,
if you're in a situation where you're just dealing
with bare IP addresses that you're sending stuff to,
then the messages will still work, and possibly, you know,
more saliently to you, Host helps route requests
among different domains that sit on the same server,
otherwise known as subdomains or virtual hosts.
So here are some examples. Here's www.debian.org,
the main website. Here's bugs.debian.org,
the bug tracker; lists.debian.org,
the mailing list; wiki, the Wiki, the knowledge base;
and so all of these come off the root, and so you would say, "Host wiki,"
or "Host bugs," and so on. Here's something
interesting you can do. In requests, you can tell it
arbitrary headers to put into a request,
using the requests Python Library. So here I requested debian.org and I mentioned that the host
I wanted was "sumanarific." There is no sumanarific.debian.org,
I believe. Maybe there ought to be. And here's the text
of the response; it basically says, "Hey, welcome to mirror-csail!" It's kind of an "I just
installed this server" page, so probably, you know, you should
watch out for that sort of thing. Here's a spam story. I installed Drupal once.
We all make our youthful sins. It was fine, it was fine.
Drupal is a perfectly fine project. OK, so the next day,
I went to my 404 logs and I saw that there was
"page not found" for a website that wasn't mine.
That seemed odd. That was contrary to my understanding
of how the web works. I have anonymized this.
I think they're a little bit more subtle these days than making
their websites "myphishingsite.biz." And I thought, "What in the world?"
And I looked in the logs and I saw that the log said
that the GET request was for http://myphishingsite.biz. And the thing is, you might think, "Oh, well maybe that's just
a legitimate error "of someone who started typing
in the URL bar "and forgot to take out the other
URL that was already there." But if that happened,
it would start with a slash. Everything here is going to
come off of the root. All legitimate requests
are going to start with a slash. So then I started playing around with --
"What in the world is going on here?" You know, I thought I was going
to spend that week, you know, customizing my Drupal install,
and instead I entered, like, this murky world of spam
and web malformation. And I was able to intentionally
malform my request to match what I saw in my logs. So if you use Netcat, right, OK, you tell Netcat,
"Invoke myhostname.tld," but then the GET request goes to
http://spam.com and Host spam.com, just to make sure
that spam.com shows up a few different places
in these 404 logs, or if you want it to look
a little bit more legit, then Host spam.com,
but GET something that looks like it might have come off of the root
and you just were looking for a resource that wasn't there,
like /viagra-bitcoin. I would be really surprised
if there were any resource anywhere on the web
called viagra-bitcon, but "there are more things
in heaven and earth "than are dreamt of
in my philosophy." This is a terrible idea. If you
learn of an interesting trick from spammers,
probably you shouldn't do it. That's just a Sumana's Life Lesson. You can define your own header. There's a set that is defined
in these RFCs, but header fields
are fully extensible. There is no limit on the
introduction of new field names and each presumably
defining new semantics not on the number of
header fields -- nor, sorry, on the number of header fields
used in a given message. Usually if you are coming up
with your own header, whether it's a request header
or a response header or both, you prepend X
so that people understand that this is something
that you made up. Here is an example. Wikipedia and the other Wikimedia
projects have X-Wikimedia-Debug. This is an HTTP request header
that their developers can use in order for a particular request
to choose a particular backend or varnish,
to choose caching behavior, to record a trace
and profile a request, to add and look at
particular debug logs, to go into read-only mode,
and there's browser extensions they've created
for Chrome and Firefox so that you can say,
"For this particular request, "do these things
in the X-Wikimedia-Debug header." I think this is super cool.
There's more at this address. So here's an example
from a response side instead of a request side.
This is a flask app I made that remixes the names
of physics articles on Wikipedia to come up with
sci-fi novel titles, like "Lorentz Relation"
and "Ambiguity Particle." The code for this is available.
If you end up using this to name your sci-fi novel,
I will be bewildered and pleased. So this is a flask app.
So I added some code here. Why don't I zoom in on the
specific code that changes a header. You add a new header here
with X-Sumana-Is-Amazing, the value being
'Indeed, Verily So'. And then, if you look at
the response to a GET request and you zoom in, you see: X-Sumana-Is-Amazing:
"Indeed, Verily So." And it looks so official when it
shows up in the developer toolbar. It looks like it must be true. [laughter] It makes me feel amazing
to see that, actually. You should use this for days
when you need better morale. The code is up,
for all the good it'll do you. Is this a good idea?
If you're Wikipedia, maybe. Look at the cool things they've
done with that bespoke header. But be careful that you don't
sort of write an alternate Internet in an unnecessary way. There's a lot
that's already built into HTTP. You don't necessarily need to define
your own header all the time. Don't Balkanize things. Let's talk about status codes. So, 100 and 101 are informational. That's "Oh yeah, go on." 200 series, "Success." The 300 series,
"Redirecting you." 400 series,
"Client error, you screwed up." 500 series,
"Server error, I screwed up." So these are the sort of five --
like the five kingdoms of like, you know, animals
and plants and fungi and stuff. These are the five kingdoms
of response codes. And technically, status or response codes,
there's two parts. There's the code,
which is that three-digit int, and there's a reason phrase,
which is, you know, these English words
that are standard. The spec -- the RFC says
that a client should ignore the reason phrase content,
so you're on notice. You should just be
looking at that int. Here's a few that are really cool
that not enough people know about. 410 Gone. It was here, now it's not. So that means, for instance, "Hey, I recognize
that you are going to the address "for that April 2012 coupon. "It was there. I recognize that fact.
I am not gaslighting you. "You are right
that there was something there. "However, it's gone
and it's not coming back." I feel like this is just
a more respectful thing to do than 404 when you know
that a resource used to be there. 304 Not Modified.
This is a response to that conditional GET
I was talking about earlier. "You said, 'GET this
if it's been modified "'since such and such a date,'
and it hasn't been." So I love that there are
very economical ways to say these kinds of things
that actually provide a tremendous amount of semantics
for any API client or for other parts
of your web app. 451 Unavailable For Legal Reasons. Server is le-- this is part of
the Internet, this is part of HTTP. Server is legally required
to reject the client's request, whether that's GET, POST, whatever. "Can't let you see that;
it's censored." So for instance,
the government of India disagrees with everybody else
in the world regarding the borders of India, so there are maps of India
that are, you know, accurate that you're not allowed to distribute
to Indian IP addresses, basically. Yeah, 451 is pretty useful. and there's the DMCA request
and this and that and the other. You might ask, "Why is this
a client-side error? "Why is this in the 400 block?" In "RESTful Web APIs" by Leonard
Richardson and Mike Amundson, they suggest: "This is considered
a client side-error "even though the request
is well formed "and the legal requirement
exists on the server side. "After all, that representation
was censored for a reason. "There must be something wrong
with you, citizen." [laughter] This is straight up
part of HTTP now, because as of February 2016,
this is a standard. And you can use it,
and it's a good idea. And again, like 410 Gone,
I think it's a respectful thing to do for your users
so you don't gaslight them into thinking,
"Oh, I guess that doesn't exist." No, it exists, and a government
won't let us help you. I think that's a bit of
consciousness-raising you can do, and it's just straight up
more accurate and more precise. Now some WTF responses. All of these were found
in the wild. There were real, running servers
that gave these as responses to just a straight up GET
on the root of a website. "Code: 126,
Reason: Incorrect key file," and then a bunch of SQL suff.
Don't let attackers see this! [laughter] "301, Reason:
explicit_header_response_code." I think someone forgot
to substitute a variable here. 403, which is Forbidden. "You've got to ask yourself
one question: Do I feel lucky?: OK, that's witty.
All right, you get a pass. This one, though... [laughter] "Reason: can't put wasabi in bed." I will buy
a Portland area microbrew for someone who can help me
understand this one. [laughter] "404, Reason: HTTP/1.1 404." That's not a reason,
that's the same thing again. 404 Not Found,
and then some PHP code again. Don't show this to your attackers! "200, Forbidden."
No, we have another code for that. [laughter] 403 is Forbidden. Don't be that
person who starts saying "goodbye" instead of "hello" because, "Like,
it's all socially constructed, man." Yes, it is a social construction,
and we depend on it. Do not break language. This has been
a public service announcement. [laughter] "404, Apple WebObjects." Apple WebObjects may be an error,
but it's not 404. "404, forbidden."
Again, we have 403 for that. 434, which doesn't exist; reason: completely nonsensical. "451, unknown reason." No, we actually do know!
It's the legal one. "503, Backend is unhealthy." My butt is fine, thank you. "520, Origin Error,"
so not a real thing. "525, Origin SSL Handshake error." I think this is basically
was a cloud flare error that end users are never
supposed to see. Well, whoops. 533 -- my friend Audrey Eschright
said, "Is this an SAT question?" [laughter] "732, Reason --" OK, that's, like, beyond the five kingdoms
I mentioned, right? Seven? And then this copyright page
is the reason? "999, Request denied."
It has been brought to my attention that 999 is an emergency phone
number in much of the world. Still don't do this. If you want to do this:
changing your reason-phrases. So here is some Python 3 code.
We're later in the talk now, so we've gone from Python 2
to Python 3, as y'all should do. You import http.server,
part of the standard library, you subclass some things,
and you just change this dict so that responses[200]
corresponds to "Oll Korrect" or "Oh Kay,"
which are apocryphally the words that OK
is an abbreviation for. The code is online,
and I go to localhost, and indeed, the status code
that comes back is "200 Oll Korrect." Don't do this!
Don't break language. This has been a repetition
of my public service announcement. There's so much more.
There really -- I could have spoken
for the length of PyCon on amazing things
you can do in HTTP. You can say specifically,
"Don't cache this. Hey browsers, "this is a fast-moving news event.
Do not cache this document." You can pass instructions
directly to the server or client. There are these methods
I didn't get to talk about. There's a specific response
that says, "Uh, state change, "conflict, sure you want to do
this overwriting business?" There's look-before-you-leap
requests, where you say, "Well, do you think -- hey server,
if I posted this five-gig file "to this address, would that work,
would that be OK for you?" And the server can say,
"Yeah, come on down," or "Uh, no." Resources at HTTPS
and HTTP URLs can differ. HTTPS is not just HTTP plus TLS. There can be entirely different
addressing schemes. You can rank your preferences
in sort of like this ranked preference voting
using a "q" in the Accept headers. Content-Disposition, that is
the header that lets you say, "Treat this like an attachment."
There is so much more. And studying HTTP,
studying the way that it works and understanding the "why"
of why it works, it gave me a feeling of power; it gave me a feeling of
increased capability, for sure; but it also gave me
a sense of wonder that we're standing
on the shoulders of giants. If you look in Python's implementation
in the standard library, there's a comment
at the top that says, "Well, this is in accordance
with these specs," written by people like
R. Fielding, Roy Fielding, one of the people
who created HTTP. And I love that we all get to learn
from each other in this way and build on top of work
done by others that helps us see new ways of thinking about resources,
about state, about representation, about economically saying
complicated things. I had a real sense of amazement. And it makes you think,
if we had used more of these built-in semantics instead of sometimes reinventing
the wheel and implementing stuff -- inscripting languages on top of it,
what might the web have been? But also, we still have a chance;
what might it be? The web is still young.
What will you do? Well, you can read and play. RFCs 7230 through 7235
are the definition of HTTP 1.1. Go ahead and read them. They are
actually not that intimidating. Like I said, there's ASCII art. Use the requests library in Python. Use tools that are available
on the command line like Netcat, Wget, netstat, and Telnet to play with bare messages
on the wire. Basic HTTP servers
in your favorite language are available.
Go ahead and read that code. It's actually a great way
to understand what is going on when you send and receive
requests and responses. And you can look at the code --
again, for personal edification and pranks only please --
from this talk on gitlab. I'd like to thank
Leonard Richardson, Greg Hendershott, Zack Weinberg, The Recurse Center,
Clay Hallock, Paul Tagliamonte,
and Open Source Bridge for supporting me
in the development of this talk. I'd also like to thank
Julia Evans, Allison Kaptur, Amy Hanlon, and Katie Silverio for the example
that they provided me in projects and talks
that are ridiculous bad ideas and fantastic for play,
exploration, and learning. Thank you. I'm ready for questions. [applause] (host)
So we do have a few minutes for questions. If you're leaving the room,
please tiptoe and be quiet. (Sumana Harihareswara)
I'm going to thank the captioner again while people line up.
I'm sorry, again, and thank you. [applause] (captioner)
You say that now... [laughter] (Sumana Harihareswara)
I don't know where you live, but maybe I can buy you a beer,
or like an Internet beer or something. (captioner)
No way. I'm heading home as soon as possible. (Sumana Harihareswara)
Go ahead, please, ask your question. [laughter in response
to captioner's remark] I'm sorry, please go ahead
and ask your question. (audience member)
I just got upstaged by the captioner. That's awesome.
I was wondering if you know where the number 451 comes from. (Sumana Harihareswara)
Yes, I do know where the number 451 comes from, and I think you're
guessing the correct guess, which is that it's based on
Ray Bradbury's classic novel Fahrenheit 451, and that is
definitely a very cool thing about it. I like to sort of leave it
as a bit of an Easter egg, but, like, I'm also happy
to open up the Easter egg and say, "Yay, chocolate!" [laughter] (audience member)
I remember seeing somewhere this idea -- (Sumana Harihareswara)
I'm sorry, could you stand closer to the mic so I can hear you? (audience member)
I remember reading somewhere that POST and PUT,
that one of them is usually used for, like, idempotent operations
and the other one not. Is that standardized anywhere,
or is that just common practice? (Sumana Harihareswara)
My understanding, and I'm totally willing to be told
that I'm incorrect on this, is that PUT simply says, "Here is the resource to go
at that address" and then, you know, POST is a little bit more
ambiguous in this way. And then I feel like I would want
to do a little bit more of a survey to understand what is -- what we're
going to call common practice. I think that's something
that is in a sense a matter for sociologists, almost. But I'm not sure.
That's really my answer to you. We have a few more minutes. If the captioner wants to, like,
impugn me some more, quite rightly so,
you are welcome to do so. (captioner)
No. You were entertaining. [laughter] (Sumana Harihareswara)
Aw! Thank you. Oh, we have one more question.
Go ahead. (audience member)
Could you talk a little bit more about when it is or isn't
a good idea to add a custom header? Like, what are some more
use cases for that? (Sumana Harihareswara)
Oh, I see, OK. Ideas for custom headers. So I think that one example
that comes to mind, by the way -- I think it's good
because my definition of use may be a little bit wider
and more fluid than some other people's,
but on debian.org, they have a response header
so that every single response coming from www.debian.org includes an X-Clacks header which mentions
Terry Pratchett, the author. Terry Pratchett
wrote the Discworld novels, which include a clacks system,
C-L-A-C-K-S, that is very similar
to our telegraph or Internet, and it is the custom
among the clacks operators that when one of them dies,
they keep saying that person's name in clacks signals
throughout the system, because as long as your name
is being spoken, then you never truly die. So I think that's useful because
I think it's really important to remember our history and people
that we want to honor and value. In a somewhat more functional,
production-oriented, "I need to get VC money
for this" way -- unless someone is going to be
offering me VC money for just remembering Terry Pratchett
as a service or something. I made my session chair laugh!
I'm so happy. I think that an example
of a good header is one where the semantics
and sort of the domain knowledge of what you're doing
in a particular sort of ecology between a server
and perhaps particular clients is one where you want
a fine gradation of response, so that -- I mean, you can sort of
do this with, like, User-Agent. You can say, "Oh, User-Agent,
give a different response "to different user agents"
or something like this. But if there is a particular
kind of difference that you want people
to be able to trigger, then I think that a bespoke header
can be useful in this way. It seems like performance-y things, when there are different kinds
of tradeoffs that people might want about, like, performance
versus ease of use or convenience or security
or something like that, and you want the client to be able
to programmatically choose where on that spectrum
they want the response to be and the processing time to be,
that might be good. (host)
And that's all the time we have. Let's thank Sumana again. (Sumana Harihareswara)
Thank you. [applause]