CS50 2017 - Lecture 6 - HTTP

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[MUSIC PLAYING] DAVID J. MALAN: All right, this is CS50 and this is lecture 6 and as you may recall, today we begin to transition away from this low-level world of C and command line programming into to a domain that's probably a little more familiar, that of the web, and yet all the ideas that we've been exploring thus far like functions and loops and conditions and so forth are still going to be relevant. It's just we're going to start using slightly different syntax and the user interface, or UI, is now going to be your browser instead of a black and white terminal window with just a simple textual prompt, but how did we get here? Well, recall that we've looked recently at structs and what was nice about structs in C was that we had the ability to make our own custom data types and to kind of encapsulate together related data, and that became pretty powerful when it came time for forensics to actually manipulate bitmap files or JPEGs, and even though this struct is way more complicated than a student structure, at the end of the day it's just individual data types that are all somehow interrelated, and by putting them in a struct you can move them all around and copy them and save them all together as you might have done as well, but then most recently did we introduce a somewhat fancier structure, which is still the same idea. It's got like one or more things inside of it, but now, more powerfully, one of those things had this star or asterisk, which gave us, of course, a pointer or an address, but what was so powerful about this simple idea and this seemingly simple symbol is that now we can kind of stitch together in our computer's memory any kind of structure we want. It doesn't have to just be one entity. It can somehow be linked to another and you can keep linking these structures together as well, and this of course was an improvement on perhaps our simplest of data structures early on, an array or a list, but of course, as soon as you have pointers can you begin to link things together until we got something like this and perhaps now with the dictionary implementation you yourself might be exploring a linked list, a hash table, a [? try ?] or some variant in between. And then lastly, there was this painting of a picture, whereby this is your computer's memory put a little more descriptively, and this is germane only insofar as your computer uses different chunks of memory differently. All of your function calls end up using the stack. All of your users of malloc and its cousins end up using the heap, and then of course there's this up here, and what was the text segment, which we didn't really dwell on? What was the text segment all about? Text-- you're being volunteered. Yes, what's the text segment? AUDIENCE: Files information? DAVID J. MALAN: Files information, yeah. Specifically, the 0s and 1s that compose the actual program. So when you compile your source code, like hello.c, into 0s and 1s, those end up getting stored in this location in memory while the program is running. So long-term they're stored on disk or your hard drive or whatever's inside of the computer or the server so that the files persist even when the power goes off or you walk away from the keyboard, but as soon as you double click a program on your Mac or PC or as soon as you do ./hello or some command like that, those same 0s and 1s get loaded into your computer's RAM, the picture we keep showing, and that's where they live while they're in use by your Mac or PC or the actual server, but thus far we've been running all of these programs with something like ./hello or some similar command and running them just in the so-called terminal window. But you are probably most familiar, certainly with more graphical apps on your phones these days and any time you visit a web browser on your phone or on your desktop or laptop, you're still interacting with a program. It's just that program is not only running on your Mac or PC. Your browser is, like Chrome or Edge or Firefox or Safari or whatever it is you use. That's running on your Mac or PC, but what you're communicating with is a program elsewhere, somewhere else on the internet and those programs are called web servers. A web server is just a piece of software that some human or humans wrote and their purpose in life is to serve web pages. When you request the homepage of Facebook, there is a server out there, a program someone wrote, that essentially spits out the 0s and 1s that compose Facebook's homepage, but nicely enough, those 0s and 1s are not written as 0s and 1s by Facebook engineers. They're actually written as something a little more English-like, a little more familiar, and it's not even programming code, per se. It's what's called markup language, and we'll soon see that and more today. So we've gone from compiling your code and running it like this to actually doing that in a web-based environment, but of course, when you're running your own programs in CS50 IDE down here, you're actually using another piece of software that fills the screen, CS50IDE, a.k.a. Cloud9, which is the program essentially running somewhere in the cloud. And we'll start to make this distinction and through examples will the distinction among these different types of software begin to make sense, but where is something like CS50IDE running? Where is Facebook running? Where is Google.com running? Well, back in 1998, Google.com was running on this. This was Larry and Sergey, the founders of Google's, very first implementation, apparently, of their first rack of servers. So servers are generally stored literally in a rack like this. It's usually like 19 inches wide by convention and you just stack computer on top of computer on top of computer, but things were very bare bones back in the day of Google and so there weren't even plastic or metal cases around a lot of their computers. They were trying to minimize cooling, minimize cost, presumably, and cram as much hardware into that footprint as they could, and so you actually see a lot of the wires and hardware kind of sticking out, and this is on display now out west. Of course these days, fast forward just a decade or two, and this is one of Facebook's data centers where it's the exact same idea, but much fancier, much prettier, much better lit servers, but who serve, at the end of the day, the exact same role. There are bunches of servers around the world that are just sitting there waiting for you on the internet to make a request for a homepage, for an email, for any other type of information so that it ends up getting sent from server to client. And in fact, if you've ever thought about those words server and clients, which is probably the lesser used of the two, but a server-client relationship is what you have when you go into a restaurant, and you ask the waiter or waitress for something to eat. He or she brings something back to you, thereby serving you, the client, and the relationship on the web is pretty much the same thing. We are the clients. Our browsers are the clients and out there are servers like these who are serving up content and information, such as on Facebook. So let's consider how all this data even gets to us. So odds are, these days, if you want to visit Facebook.com on your laptop or desktop or phone without using the app, you probably just type in Facebook.com and hit Enter. If you're a little older school or literally older, you might just actually type out the entirety of www.Facebook.com. Both work and there are technical reasons for that related to the topic we'll talk about today, but both of these work just because Facebook has configured their website to work in either of those addresses. Now, as an aside, why are so many websites therefore prefixed with "www" if both of them actually work? Like, why have both? It seems just like redundant to type "www." if it's implied by Facebook.com. Yeah, what do you think? AUDIENCE: Is the "www" required? DAVID J. MALAN: Is it required? Nope, not required. Not required. Yeah? AUDIENCE: Is it to identify that it's part of the World Wide Web? DAVID J. MALAN: Kind of, yeah. It's to identify that it's part of the World Wide Web and no one really says World Wide Web these days. We of course just say web, but back in the day and back in my day, frankly, it wasn't obvious to a lot of human beings what Facebook.com might actually even mean, irrespective of the fact that it didn't exist at some point. And so there was this sort of signal to the world whereby you just started prefixing domain names with "www" just to make super clear to users, oh, this is a website! This is one of those things on the internet or the like, and also back in the day, there were also different services that have fallen into disuse these days, like FTP was quite popular and Gopher, we used it when I was here and other such things. And so "www" was just an arbitrary prefix that just kind of said what it is, but these days we humans pretty much know what a .com is and .net and .edu, but even that kind of road is changing again because there's dozens, hundreds of top-level domains. It's not just .com and .edu and others now. I mean, there's hundreds of these things out there and so it might even be non obvious to this day. So some people, therefore, go really all the way in and type out http:// and then the address that they want to visit, but odds are most of us don't do this because our browsers just help us out and prefix that, but that's where our focus will be today. Like, this actually has significance because it specifies what protocol or language, what convention your computer, your laptop should use when talking to that server's address, and actually, if you want the communications to be secure, odds are your typing or your browser is doing it for you. Adding an s there, denoting secure or encrypted a la Caesar and Vigenere from some weeks ago and technically your browser is also probably adding a trailing slash even if it's not shown to you, which denotes you want the root of the server, like the default homepage or something else. In fact, maybe you do want something else. You don't want just Facebook's website, you want Mark's page, and so you could specifically /zuck or whatever the username actually is. So this is a very long way of saying all this kind of stuff that we type or autocomplete and take for granted these days actually has some very fundamental meanings, all of which make possible the entirety of the web. So what actually goes on with HTTP and what does that actually mean? So HTTP is a protocol. It is a set of conventions that dictate how a computer client, like a browser on your Mac or PC, talks to a web server, and it's a protocol in the sense that it's not a language, per se. It's really just a set of conventions and so like this is kind of an arbitrary and awkward human convention. Hello, I'm David. AUDIENCE: I'm Kara. DAVID J. MALAN: Kara, so Kara and I just introduced ourselves. I extended my hand and she kind of knew instinctively that it would be awkward not to shake my hands or to shake my hand and so we exchanged pleasantries and said hello. So this is just kind of a silly human convention whereby we've agreed sort of socially in advance how to greet each other in that way. So HTTP is pretty much the same thing, but in this case you're not actually physically doing something like that. You're kind of sending a message from client to server. You're putting a sort of handwritten note into an envelope this, addressing it somehow and then sending it off on the internet for Kara or for Facebook.com or Google.com to actually receive, and then when Google or Facebook or Kara receives that note, reads and sees what I want, the server or the human responds in some according way. So what then goes inside of this envelope? Well it turns out that when a web browser, like Chrome or Edge or Firefox, Safari, make a request, the message they put inside of one of those envelopes, albeit virtually, is literally this text. It's like if I had it written down on a piece of paper literally GET / HTTP/1.1 Host: www.facebook.com and then the "..." just means there's other stuff in there, but it's less fundamentally interesting right now. So what's this all mean? GET is just a verb and it kind of says what it means. Go get something from the server. HTTP/1.1 mentions the version of HTTP that I am using or the human convention that Kara and I were actually implementing there, and so 1.1 tends to be the one most in use these days, and then /, again, it's just like the default identifier for the homepage of a website, the default page that you see in the absence of typing something like and /zuck or some other suffix. Host: Is the same thing as whatever's on the outside of the envelope. So if I'm sending a message to www.Facebook.com, I'm just making super clear inside of the envelope which server should expect this request just in case there are multiple websites running on the same physical server, which is possible for economic and performance reasons these days. So alternatively, if I were trying to visit Mark Zuckerberg's homepage, the request in that envelope's going to look almost the same, but I'm going to be more precise. /zuck instead of just /. Meanwhile, if I'm requesting something from Yale's homepage, the request would look like this, or from Harvard's web page, the request would look like this and so forth. So once Harvard or Yale or Facebook w actually received the request in that envelope, opened it up, look at it, how do they decide how to respond? Well at the end of the day, I'm probably expecting to get back from the web server some kind of, excuse me, web page whereby I want to see my news feed on Facebook or I want to see the search page on Google or I want to see Harvard's homepage, Yale's homepage or whatever it actually is. So there's a lot of information probably packed into that envelope, but there's also a conventional, a standard, response that looks literally like this. So at the very top, for instance, of the "letter" that comes back from Google or Facebook is a message like this. Got it. I'm speaking HTTP version 1.1 also. Everything is OK and 200. We'll come back to that in second and then the type of content inside of the envelope, if I keep digging deeper into it, is going to be text, but more specifically, HTML, and we're going to focus on that today too. HTML, Hypertext Markup Language. This is going to be the language in which web pages themselves are written. Then there's usually some other stuff and way down there is the actual contents of Yale's or Harvard's or Facebook's homepage, but let's zoom in on this for just a moment. 200, odds are you've never seen or cared to see this kind of number before, but have you ever used the web and requested a web page and seen some number that for some reason keeps popping up in your life? AUDIENCE (IN UNSION): 404. DAVID J. MALAN: Yeah, 404. It's just kind of a weird thing that many of us in the room know 404 even if we're not necessarily technophiles and know what HTTP is, but it turns out that in these envelopes coming back from servers sometimes are not just 200 OK, but instead-- dammit, typo. This would be much more effective if I said it's not this. It's not found. So inside of the envelope is 404 not found, which means exactly that. The file was not found that you were actually seeking. You mistyped the URL, the page was deleted. Somewhere or other, there was some kind of typographical error and it turns out there's a lot of the status codes in HTTP and there are even more than these, but these are the ones we might see the most commonly. 200 OK means all is indeed well. 404 means not found. 403 forbidden might be if you've not logged in or don't have the right access in order to access some folder or file on some website. This is really bad and we'll get to know this over the coming weeks as we ourselves start implementing code on a server. 500 internal server error, if you will, shall be our new segmentation fault, but hopefully not too frequently. It means something is wrong in the code on the server. This was an April Fools' joke back in 1998 I believe, yeah. So April Fools', some humans decide it would be funny to announce to the world that there's yet another code, which is 418 I'm a Teapot, which kind of comes up from time to time in actual code and then there's this one-- 301 Moved Permanently. It's kind of a scary sounding thing, as though a website just kind of up and left and went elsewhere, but it's a powerful mechanism in the following way. If a server inside of one of these envelopes responds with a response like this, there tends to be one other piece of information at least. So if I visit a website like http://harvard.edu, I might get back in the response from Harvard's web server this answer, 301 Moved Permanently. Like where the heck did Harvard go? Well you can see the location based on this other line and all of these things collectively moving forward we're just going to call HTTP headers. Anytime you see a word and a colon, that's an HTTP header with a name and a value and the first one ones a little anomalous in that there's no colon, but that's the only one without the colon. So location colon http://www.harvard.edu. Well what's going on? Well, if I actually visit Harvard's homepage exactly as follows, let's take a look at what happens. I'm going to go to http://harvard.edu, Enter. And notice there's a whole bunch of more stuff happening on the screen thanks to what's called autocomplete, which is a feature of Chrome or my browser. It has nothing to do with the topic at hand. This is just Chrome trying to be helpful today as on your computer too and suddenly, even though I tried to go to http://hardvard.edu, , where did I clearly end up? HTTPS, so they added the s somehow and what else has it added? [VARIOUS ANSWERS FROM AUDIENCE] DAVID J. MALAN: Yeah, the web. The www prefix was added. So this is not sort of all that important to the user like I got to my destination somehow but the reason for that is as follows. I'm going to go ahead and open up, in the IDE actually, just a terminal window here and I'm going to use a new program called Curl for connect to a URL ://harvard.edu, Enter. And I get back some cryptic looking things and that's actually HTML, and we're going to come back to this in just a moment because it turns out there's two parts to the messages coming back. There's the headers and then there's the content, and we're seeing the content. So more on that in a bit. I want to look a little higher up in the response and literally just look at the headers, and to do that-- and you would only know this from reading the documentation-- -I means show me just the headers that are coming back. So here now we see the headers coming back and you'll see indeed we got back a 301 Moved Permanently, and then there's some other stuff we haven't really focused on, but at the bottom is something we have-- location, which says to the browser go to this URL instead. All right, so let me do that. Let me save time and just copy paste this and then do curl -I of this, Enter, and pretend to be a browser requesting that page now, but now where are they trying to send me? HTTPS. So this suggests via some mechanism, some human at Harvard decided one, uh-uh. We're not going to be called like harvard.edu. We shall be www.hardvard.edu for whatever reason and then they also decided that if a user visits us using HTTP, which is not encrypted, not secure, we're going to forcibly tell them to come back via secure channel, and we won't dwell today on how that's implemented, but much like in Caesar or Vigenere where was a way to encrypt or scramble information, browsers can do that too and it's implied by using the HTTPS instead of just HTTP. All right, so let's actually visit this one more time. Let me go ahead and highlight that location. curl -i of that address and now an overwhelming amount of information coming back, and that's why I kept putting the ...'s, but the juicy stuff is at the top. Now everything is 200 OK and indeed, if I run it without -I so I see the contents of the envelope, it's like looking deeper inside of the envelope, now I actually see a lot more content, which collectively composes Harvard's homepage, and it turns out we can see this even in Chrome. Let me go over to my browser again and if you've not done this before, it turns out that you can go to your View menu, Developer, and go to Developer Tools-- and we'll do this in upcoming problem sets-- and I can go here and see a whole bunch of features, only a couple of which we might look at today. Specifically, I'm going to click on this Network tab. So to be clear, Developer Tools in Chrome still shows me the homepage, but it kind of dedicates part of the screen to these special developer tools that make it easy to understand and actually create websites. So eventually we'll start using this ourself, but what's nice about the Network tab is that you can sniff or monitor all of the requests going back and forth between browser and server in the so-called envelopes. So I'm going to hit a little Clear symbol here first just to get a clean slate. I'm going to click preserve log so I can actually see what's happening and now I'm going to go ahead-- actually, I'm going to go ahead and do this. http://harvard.edu, so the sort of incorrect version that I'm going expect the browser to fix for me. I hit Enter. A whole bunch of stuff is flying across the screen and in fact if we zoom in on this, you can see that just visiting Harvard's home page requires 85 envelopes it would seem going back and forth with pieces of the webpage and we'll see soon with some of those pieces are, but it's not just one file coming back. It's bunches of files. Maybe images, maybe fonts, or some other things too, but I'm going to scroll up in this output and now notice the story that's been told here too. So the very first request, which I can hover over and see, came back with a 301, which we now know is Moved Permanently, or it's a redirect. Then if I hover over the second one, you'll see that it's a slightly more precise URL, www, but still with HTTP. So that got redirected and then lastly, if we look at the third line here, this is the one we ultimately ended up at and indeed it comes back 200, as do bunches of other results thereafter, and we'll see what those 200s actually mean. Now, you can do a little better than this and it's perhaps fitting that our friends down the road indeed did. Let me go back to the IDE. Let me go ahead and clear this and instead of curling harvard.edu, let me do http://yale.edu and ask the question, what would be a better approach-- knowing these ingredients that we now have of how redirects work. How could Harvard do better in terms of getting the user to the address that we intend them to be at? Yeah. AUDIENCE: By not forcing like, two redirects? DAVID J. MALAN: Yeah, by not forcing two redirects, right? Even if some of this material is new, we've long talked now about correctness and design and style and we've seen some messy style on the screen and that's fine for now. More on that later. It seems to be correct because it's working, but it feels like it could be better designed because why make one request then make another request just to fix the first request then make a third request just to fix the second request? Why not combine them? And, as it turns out, someone down the road had that same intuition and so we visit yale.edu with just HTTP and without the www, they, in one fell swoop, actually redirect us to the right place in this case. So, with that said, it's perhaps fitting that just a few years, well, some years ago now, you might have tried to visit this particular address, and this is something I can only do in Cambridge. If I go ahead and open a new browser and go to http:// shall we say safetyschool.org and hit Enter if you've never been. Oh, interesting! [STUDENTS LAUGH] DAVID J. MALAN: And apologies for those of you tuning in online live from New Haven. So how is this possibly working? It's actually a very simple heuristic. If instead of selecting Yale or Harvard or any other address, if I literally do like safetyschool.org, we can wrap our mind around what's going on underneath the hood safetyschool.org has moved permanently to New Haven it would seem, but it's via this very simple mechanism that someone back in 2000 registered this domain name, and so actually as I was looking this up in the history last night, I was amused to find that whoever bought the domain has been paying for this domain name now for 17 years for this joke annually, but it's well worth it, but I think it would be-- [STUDENTS LAUGH] DAVID J. MALAN: But I think it's only fair now, it's only fair if we take a look at another one too. It turns out that if you visit harvardsucks.org, that one has also redirected, this time to www. So let's follow this little breadcrumb. curl -I harvardsucks.org, and this one's OK. So that means something lives at harvardsucks.org and it does not as cleverly redirect to harvard.edu, but to introduce this, let me actually introduce a friend of ours who's now very awkwardly visiting from New Haven today. Hi Natalie. Do you want to come on up and say hello for just a moment? So this is Natalie, who is our head of the class with Benedict Brown and [? Anushri ?] and with [? Staleos ?] in New Haven. If you'd like to say a quick hello? Hi, Hi, everyone. DAVID J. MALAN: So nice to have you here today and as you know-- do you want to make mention of what we're about to see here? What happened back in 2004 just a few years later? AUDIENCE: We did a prank back, basically. DAVID J. MALAN: OK, so perfect set-up. Thank you very much. Hello to Natalie. Let me go ahead and hit play on three minutes that are kind of hard to justify academically, but it's perhaps one of the best pranks that's ever been played. Long story short, our friends down the road got together with a few of themselves just before Harvard Yale, which was to be at Harvard that year and actually mapped out using software, a sort of grid system that lined up with all of the seats in the Harvard stadium, whereby you assume that a human each takes up some amount of space, and then they used special software to figure out how they might spell something out in the audience in a way that would be readable to the opponents, the Yalies, on the other side. So if we could dim the lights for this look back at yesteryear and a slight use of software. [MUSIC PLAYING] - All the way at the top. - This is for you Yale. We love you Yale. - We're here to cheer for Harvard. - Yeah! Let's go Harvard! - Yeah, Harvard! - Take the top one and pass it down. - It's not going to say something like Yale sucks is it? - It says Go Harvard. - We're nice. - You see that shit? Look at them, they have the paper! It's gonna happen! It's actually gonna happen! I can't [BLEEP] believe this. - What do you think of Yale? - They don't think good! - It may be a complete mess, I don't know. - Does everyone have it? Does everyone have their stuff? - The probability that it's gonna be legible is very small. - It's gonna happen! It's gonna happen! - It's too complicated. - Look, look at all the signs. - I know but it's too complicated. - Uh, what houses are you guys in? That's not a real house. - Ho-fo? - Yeah. You guys aren't from Harvard are you? - No, fo-ho. Pforzheimer! - Yeah, but he said ho-fo. - Let's just make sure everyone has it. - Well she's probably drunk. - Are all the cards disributed? - Almost! [APPLAUSE] [CHEERING] - Hold up your signs! - They [BLEEP] did it! [CROWD CHANTING "YOU SUCK!"] - They [BLEEP] did it! They [BLEEP] did it! [CROWD CHANTING "YOU SUCK!"] - What do you think of Yale sir? - They suck! - One more time! - One more time! - Oh and there it goes again! [CROWD CHANTING "HARVARD SUCKS!"] [END PLAYBACK] DAVID J. MALAN: All right, we've been talking about what goes on inside of this envelope, but what goes on on the outside? So when you hand off this envelope from your laptop or your phone to the internet, how does it actually get to its destination? Well you've probably heard this acronym IP, or internet protocol, and it turns out that every computer on the internet and every phone in this room and any very laptop in this room has a unique address. That unique address is known as an IP address and it's much like the address of a building in the real world, like the Science Center might be a 1 Oxford Street Cambridge, Mass 02138, USA. Down the road is the CS building. 33 Oxford Street Cambridge, Mass 02138, USA. So those long strings uniquely identify buildings in the world for the mail service and the like and similarly do IP addresses uniquely identify computers on the internet. These addresses are much more succinct though. They're not long strings they're instead just numbers that have four parts and each of those numbers within the IP address are a value from 0 to 255. So the lowest IP address is all zeros and the biggest IP address is all 255s with some constraints. You can't quite use all of those numbers. So just as a sort of quick teaser, if the smallest number is 0 and the biggest number for each of these sections of the IP address is 255, how many bits are being used for each of those four numbers? AUDIENCE: 8. DAVID J. MALAN: Yeah, 8. So remember like 8 bits gives you 2 times 2 times 2 times 2 times 2 times 2 times 2 times 2, which is 256, and indeed we have 256 total values from 0 on up to 255. So an IP address is 8 plus 8 plus 8 plus 8, or 32 bits total, or, just come really full circle with week zero, if you have 32 bits, roughly how high can you count? Like what's 2 to the 32 power? Yeah, it's roughly 4 billion. So, long story short, the implication of this very simple definition is that apparently there can only be, in this model, four billion computers, phones, refrigerators, internet of things, devices on the internet at once if they do all need an IP address that's unique. So I've been telling a slight white lie in that they don't have to all technically be unique because there's ways we can share addresses, and it turns out there's even bigger addresses these days that aren't just 32 bits but 128 bits, which is just massive and daresay unpronounceable how big that number is. So we've gotten ahead of this issue, but you'll find that in a lot of locations, companies and internet service providers like Comcast and Verizon and the like and campuses like Harvard and Yale, you can notice that they tend to follow patterns, like many of the IP addresses here at Harvard start with 140.247.something.something or 128.103. Down the road in New Haven, a lot of the IP addresses there start with 130.132 or 128.36, which is not at all interesting to the humans who are using these IP addresses, but it is useful to the servers or the devices that are actually routing these envelopes from one place to another. Meanwhile, in our homes and even sometimes on campus these days, there are also what are called private IP addresses, which are numbers within these ranges, and this has been a solution so that when you sign up for Verizon or Comcast back home or your parents do for internet service, you technically only get one IP address from your internet service provider. That's what you're paying for per month, but thanks to something called network address translation and other technologies, you can actually give all of your siblings and parents and family members or roommates in the household their own unique address. It's just private in the sense that no one else on the outside world can access it unless you initiate the connection. So this is generally why at home you can reach any website you want any service on the internet that you want, but you can't have like random people necessarily trying to get into your laptop or your device at home because there's a device, a home router, that translates these private addresses into otherwise public addresses, but for now the takeaway really is just that every computer on the internet has an IP address, and if you've ever poked around your Mac, like under System Preferences, you can actually see this. So I've just pulled up a screenshot here of a network control panel on Mac OS and if you look roughly there on your own Mac, you should see that your IP address is something. It will completely vary by person and by geography, but you'll see your IP address there. On Windows, at least Windows 10, you can see your IP address under Settings here as highlighted here. So this has a very different address, but that's just because this person was on a different network all together. So, where did these IP addresses come from? Well back in the day someone would literally come to your home to set up your Comcast or your Verizon internet service and he or she would like type in these numbers into your Mac or PC and then leave, and you would have one computer on the internet. These days it's a lot more dynamic. You don't need someone coming by. That certainly doesn't scale very well because there's other protocols. HTTP is this protocol we talked about earlier about web pages, but there's other protocols like Dynamic Host Configuration Protocol, which is a mouthful but it just means that our Macs, our PCs, Android phones, iPhones and the like, if they speak this protocol, when you first turn on your phone or boot up your laptop it knows, if it has support for this protocol, to just announce to the internet, hello world. I'm awake. What should my IP address be? This just kind of broadcast message and if Harvard or Yale or Comcast or Verizon or wherever you are in the world has a DHCP server whose purpose in life is just to listen for those hellos, that server should respond using the same protocol with your actual IP address, and it figures out which one to give you based on and available pool of numbers typically. So that's how you might get this but there's other things in these control panels. In fact, if we look a little lower on Windows, there's DNS servers too. Domain Name System. Another acronym and a bit of a mouthful, but you can also see this on Mac OS/2 if you actually click Advanced and actually take a look. Here, for instance, there's mention of something else altogether, a router. So there's lots of different addresses going on here and lots of different servers. So how do these all piece together? Well, DNS is an interesting one in that it's going to be the one that translates domain names to IP addresses, right? None of us ever probably visits http:// and then a number, right? Like, we visit facebook.com, google.com or the like, but that's because our computers knows how to translate one to the other. So in fact if I do this command, nslookup for name server look up and then I type in something like google.com, I'm asking the computer, in this case, the IDE, what is the IP address of google.com. I know it as the human as google.com, but the internet knows it by its numeric unique address, and it turns out Google has several, and even this is a bit of a white lie because they have thousands, but the ones that my computer is being told to use is, for instance, this one or this one or any of these other addresses. So let me see what actually happens here. If I highlight that address and open up a browser and go to http:// and that IP address and hit Enter, notice it actually seemed to work. Well, why is that? It's a little hard to see it in Chrome, but let's go ahead and open up the Inspect tab and go to Network just like before. Let me click Preserve Log so that it saves everything here, and I could be using curl. So the curl was just the simpler version. Now I'm using the more familiar graphical version. Let me go ahead and do that again and go to http:// and that IP address and hit Enter. A whole bunch of stuff flew by even just for Google's homepage, but notice what happened. On that very first-- whoops-- request, if I hover over it, I see http:// and then the number that I typed in, but it's a 301 because, what was the response? We can actually see these responses. Let me click on the status code here, or the row, go to Headers and notice here, if we zoom in, we'll see that Google responded with this location. So someone at Google just decided, OK, fine. You figured out one of our IP addresses. That's great, but we don't want you to see that in the URL. It's bad for branding. We don't want you to bookmark an IP address because it might change later on. So we're using the same mechanisms as before, but that's how we might do the lookup and we can see the same thing for any number of websites. Here we go nslookup of harvard.edu and we get back just a couple here. If I do the same on Yale, I'm going to get back different IP addresses. Yale has even more in this case and so this is how the computer's figuring out to where to send the data. So what goes on this envelope then, it's going to be not facebook.com harvard.edu or yale.edu, it's actually going to be the address like 1.2.3.4 or whatever the actual IP address is of the server I'm trying to send to. Now, of course, I expect a response from the server. I want to get back my news feed or I want to get back Harvard or Yale's homepage. So what more should I probably put on this virtual envelope, just intuitively? Yeah. AUDIENCE: Your own IP address? DAVID J. MALAN: What's that? AUDIENCE: Your own IP address. DAVID J. MALAN: My own IP address, yeah. So just like in the human world, just in case something goes wrong with the post office, I might put my own address, 5.6.7.8, and actually put that on the envelope so that if something goes wrong or, better yet, if something goes right and they're ready to give me a 200 OK, it can actually come back to me because they know from which address this thing actually came from. So who is it or what is it that's doing all of this routing? Well it turns out there's servers on the internet called quite simply routers, otherwise known as gateways, which is just a synonym, and they're kind of artistically pictured here as just dots across the world, and there's hundreds, thousands, tens of thousands of routers. Odds are you yourself at home, if you had internet access, have at least one such router and its purpose in life, again, is to take data from inside your household and send it to the internet, and then any responses you get, to send it back to the appropriate laptop or desktop or phone or smart device that happens to be in your own home. And we can actually see this too. Let me go ahead and in CS50 IDE, try one other command. I'm going to go ahead and type traceroute and I'm going to trace the route, say, to yale.edu from here, or technically from the IDE. So if I hit Enter here, we're going to see a few lines of output, and if you try this at home, just realize I've configured my IDE a little differently to simplify the output. So it looks like there's five steps between Cambridge and New Haven or technically the IDE and New Haven, but what are each of these steps? Well between here and Yale, if we continue that version of the story, there are, it seems, five routers. There are five computers that have like lots of RAM, big CPUs that can handle a lot of internet traffic that are figuring out how to get my envelope from this origin to this router, to this router, to this router, to this anonymous router, to this one. Sometimes the routers are configured not to answer these questions from this program traceroute. They sort of keep it to themselves, and you can see on the right of each of these IP addresses some numbers. So just take a guess, what do each of these numbers represent, perhaps? Whats that? No it's okay. AUDIENCE: Milliseconds? DAVID J. MALAN: Milliseconds, yep. So milliseconds that are measuring what do you think? Time to go, or time to reach that specific router. So we can kind of infer-- and this is the kind of amazing thing. To get me to New Haven takes like two plus hours, but to get an email, to get an envelope with a message takes like 10.597 milliseconds to get data from here to there, and then hopefully back if it's a request for a page. Let's do something a little farther away. So let's do like stanford.edu, tracing the route here, and already we can see that the numbers are a little bit higher, and that makes intuitive sense in that Stanford's a little farther away than New Haven and it takes as many 41 milliseconds to reach that. If I go even further and I read like a company's news like cnn.co.jp, which is the top-level domain for a lot of servers in Japan, you can see a real uptick in just how many milliseconds it takes, and in fact, there's something curious here. Why does it take so much more time to get from router number three to router number four do you think? AUDIENCE: The ocean. DAVID J. MALAN: The ocean, yeah. So there's a really big body of water in between the US's west coast and Japan's coast, which probably explains why not just between three and four, but really every router thereafter is that many milliseconds away. So these aren't cumulative. We're measuring constantly from here to there, from here to slightly farther, from here to slightly farther. So it makes sense that once you cross that ocean, that's kind of the total value that you're actually going to see, and it's fascinating really. I mean, throughout the entire world there are not only wireless technologies today, but very much wire technologies and if we take just a few seconds, we can see this visualization of so many of the transoceanic cables that have actually been dropped by big ships that carry many, many, many, many bits from one coast to another. [VIDEO PLAYBACK] [MUSIC PLAYING] [END PLAYBACK] So, with all of those cables capable of transmitting data all around the world, it turns out there's still one more problem. Even if we want to do something simple like download an internet image of a cat because there's different types of servers out there. There's my computer here like my laptop. I'm running Mac OS or windows. There's all those servers in Google's data center and in their racks and Facebook's and the like and in between all of those servers there are lots of routers, but it turns out that those servers in those racks at Google, at Facebook, even at Harvard and Yale, there are servers that can do multiple things because technically, even though we humans tend to talk about servers as being physical devices, a server is, as we started today, really just a program. It is a piece of software that someone wrote that, when run, listens for requests on the internet and responds to those requests, generally by spitting out information, text or 0s and 1s or, in some cases, cats. So upon receiving an envelope, then, how is it that a server knows whether it's a request for a web page or it's an email or it's a chat message or a voice message or any number of other things? It turns out we need one more piece of information at least on this envelope. It turns out that the world has standardized via another protocol called TCP, Transmission Control Protocol, that you need at least one other number on these envelopes, and that number corresponds to the type of service that you're trying to access or the type of data that you're trying to send or receive. So, for instance, 22 is for something called SSH, Secure Shell. This is something that most CS majors might use, but most people in the world wouldn't use this because it's entirely command line and it allows you to connect securely to some remote server without using something like a browser, but all of us generally do use browsers and HTTP, it turns out, all this time has had a unique number associated with all of those requests. 80 is the number and if we visited any URL starting with https, turns out there was a special number, 443, that humans years ago decided just uniquely identify encrypted web traffic requests and responses. 587 is used for Simple Mail Transfer Protocol, which is for email. Excuse me, 53 itself is used for DNS. So if you ever send a message to a server saying what is the IP address of google.com, you're using number 53 to identify whatever machine or software can answer that type of question, and so we can actually see this too. If I go back to my IDE and I actually do curl -I https://www.harvard.edu, this of course, worked before and it was 200 OK, but it also will work if I more precisely say specifically send this request to TCP port, or number, 80 and-- damnit. Oh, it's wrong because made a compelling pedagogical mistake. So what did I do wrong? AUDIENCE: Https. DAVID J. MALAN: Yeah, so I kind of screwed up my numbers here. So I said https, but I meant to say http if I'm using port 80 or, conversely, if I want to talk to the secure port which is known, I actually want to say 443, and that one in fact works, and I can do it again even in Chrome. If I go up to my browser and go to http://yale.edu/80 and let the redirects happen, that too will work. It's just browsers, to keep our minds focused on the website we're actually trying to visit and not distracted by technical details like :80 or slashes or even sometimes http itself, just hide that from the URL bar. It's all there. It's all happening, but we humans are getting a little more comfortable with the internet over the years so Chrome and other browsers are just starting to hide some of these lower-level implementation details. So that really means, when I actually want to send a request to a web server, I should really write :80 on the envelope to make clear that that's going to a web server listening on port 80 or maybe 443, and then, you know what? It turns out, and we won't dwell too much on the details, even my Mac or your PC also has its own port number for all of these requests, right? And it would be pretty annoying if you could only visit one website at a time or you could use Gmail or Skype but not both at the same time, or Facebook Messenger or Google Chat but only one at the same time. That would be pretty limiting, especially when we have all this computing power. So it's also the case that your own computer, any time you send a request on the internet, chooses a random or pseudo-random number to uniquely identify the piece of software on your computer that's waiting for the reply. So this might be not port 80. This is going to be a bigger number like 1025, or some large-ish value all the way up to 65,000, even, or 32,000 that now uniquely identifies the port on my computer, and that's how your computer can do multiple things at a time, and when I get the response those values are just flipped, but there's one more piece. Like cats can be pretty high quality and videos certainly take up a huge amount of data. Netflix videos and any streaming videos are taking up a huge amount of information and it would be pretty annoying to your neighbors if any time you were watching a movie on Netflix, you had to be done watching the movie in order for a neighbor to also watch a video on his or her computer as well. So it turns out that what computers also do thanks to IP and TCP is, when they're used together, they offer one more feature still. It turns out that if I want to download a picture of a cat, and we have a nice printed version here, I'm not going to get the whole cat in the one envelope most likely. This cat or this video file or whatever it is is actually going to be divided up into a few different pieces. So this message might get chopped or fragmented into four pieces. Each of those four pieces now might go in each of one of these envelopes here, here, and then here with the third and fourth, and what's nice, though, about TCPIP is that it provides at least two features for us. One, IP ensures that every computer on the internet that speaks this protocol has an address. So IP handles the getting of the data to some destination. TCP, the other half of this, ensures or guarantees with high probability delivery-- that the data actually gets there. Because as you might have gleaned from even the animation of all of the transatlantic cables and all of the interconnections among routers, things can go wrong, right? Routers, it turns out, can get overloaded. Their buffers can overflow such that they can't handle all of the traffic coming into them and in fact, if you try to watch Game of Thrones, some episode on HBO and you couldn't access it at some point or [INAUDIBLE] or some tool like that. If they're overloaded, what does that mean? It just means the server, or the routers between us and the server, are getting so many darn envelopes that they just can't keep up and can't hold onto them all at once, and so sometimes packets do get, so to speak, dropped, both physically and also digitally, and this means some packet is lost. And so what's nice about the internet is that when my computer here talks to the nearest Harvard router that may very well have antennas in a room like this or an access point, I might send off a packet here and here and let's send this all the way to the back if you could, but these packets, as you can see, don't necessarily need to travel the same [? path ?] because-- what's your name in the second row? AUDIENCE: Monsi. DAVID J. MALAN: Monsi. So Monsi is getting a little busy. So Kara, if you could route to someone else. This is literally the effect that happens on the internet. If one router, like Monsi, gets a little bit busy and her attention is elsewhere or just has too many packets to deal with, she won't even necessarily drop it but maybe their path will just be routed around her, and that's what's nice about having this mesh network around the internet. Now unfortunately, one of those packets can get dropped and in fact this is a perfect example. If you want to drop it, drop it. Uh-oh, a packet was dropped! What TCP does for us is the following. Once those envelopes reach hopefully one specific person-- OK, you are the lucky winner. Whoever, wants to-- how many do we have? Two there? Where did the third go? That's OK. TCP can handle multiple packets being lost. AUDIENCE: It's over there. DAVID J. MALAN: Oh, and so packets also don't take the shortest path sometimes on the internet. So what might happen? So let's assume for the sake of discussion that those packets did make their way to at least one of our audience members here. He or she, upon receiving them, would also see not just the origin address and the destination address. There would also be some notation, like a memo line on the envelope saying 1 of 4, 2 of 4, 3 of 4, 4 of 4, so that the recipient can infer from that little hint whether or not they received all 4 or just, as in this case, a subset thereof, and in that case, assuming the computer speaks TCP, it can simply say, hey David, resend me packet number 1 or packet number 3 or whichever were actually lost. And so together all of this happens at blazing speeds. 10 milliseconds to do all that back and forth to New Haven, let alone even faster here on campus, but those really are the basic principles and building blocks that are just getting our data from one place to another. Of course, the real interesting stuff happens when we dig deeper into this envelope and look at the contents. Not just the cat as in this case, but the language, HTML and something else called CSS which we'll do shortly, but I thought it might be fun, especially on the heels of our look at forensics, to take a look at just how sort of presumptuous Hollywood tends to be when presenting us humans with technical details that now you'll perhaps have an even better eye for in addition to the age-old "enhance" line. [VIDEO PLAYBACK] - It's a 32-bit IPP4 address. - IP as in internet? - Private network. [? Tamia's ?] private network. [STUDENTS LAUGHING] - She's so amazing. - Oh, Charlie. - It's a mirror IP address. She's letting us watch what she's doing in real time. [END PLAYBACK] DAVID J. MALAN: OK, so we'll hold it on this screen here because one, a few of you laughed when you saw the bogus IP address because the number was what? AUDIENCE: 275. DAVID J. MALAN: 275, which is too high and that one we could forgive because you don't want like random people pausing their videos on the internet then trying to hack into or get access to that URL, but even funnier is when the hacker is being described as doing this on the screen as part of their attack. This is like the source code in a language called Objective-C for some kind of drawing program, as suggested by the use of crayons in the code as a variable. So let's pause there and when we come back in five minutes, we'll take a look at HTML itself. All right, so we're back and we're about to learn a new language. Though this might feel like a lot to do in just an hour, this one's a markup language. So it's not a programming language, which means you're not going to see loops. You're not going to see functions. You're not going to see conditions or any of the kind of logic that we have built into C and into Scratch and eventually Python and JavaScript. You're instead going to see just what are called tags, pieces of English-like syntax that just tell the browser what to do and what to stop doing. So we're going to see tags that say start making this text centered. Stop making this text centered. Start making the text bold. Stop making the text bold. So these very deliberate kind of statements that we're going to express using something that's code-like, but it doesn't give you logical control. So as such, there's a pretty small language ahead of us and a lot of what you'll do when learning HTML is just check an online reference or an example online or look at the source code of actual web pages to just figure out how these things are done and today, we will focus on the fundamentals. So this is perhaps one of the simplest web pages you can write in a language called HTML. It's a text-based language. All of the tags resemble some English words and there's a pattern to the kinds of things that you might type. First of all, if you're using the very latest version of HTML, which happens to be version 5, it's been around for a while, you simply start every web page with this cryptic incantation at the top here. Open bracket, !doctype HTML female closed bracket, as those things are called. Angled brackets, which you've probably not had many occasions to type on your keyboard, but starting soon you will. Then after that, they start a pattern. So HTML > and then all the way at the bottom is what we'll call the opposite of that tag. If this is a start tag, this will be an end tag, or if this is an open tag, this will be a close tag, differing only with this forward slash that's inside of the tag. So this says, hey browser, here comes a web page. This says, hey browser, that's it for the web page. Again, this sort of starting and stopping mentality. Meanwhile, inside of the web page as denoted by the HTML tag, there are two parts, a head and a body. The head of a web page tends to contain very little. It's usually just like the title bar in the tab that we humans see when you visit a website, and the body is like 95% percent of the contents of the page, the actual viewport or the rectangular region that contains actual content. What is that content? Well here in the head we have a title that's going to be "hello, title" just because and then in the body of the web page, this web page, there's going to be "hello, body." That's it. That's HTML. If you save this text in a file, open it in a browser, you will see a really lame web page that says hello title and hello body, but that's a web page using HTML tags as they're called. Anything in these angled brackets are tags. So I can actually see this pretty clearly even on my Mac and you could do this on your PC as well. I've opened up TextEdit and I've configured it to be simpler than the default, so know that I've done a little something in advance, but you could use notepad on Windows or any other number of other programs, even Microsoft Word if you save it in the right way or Google Docs, but let me go ahead and just recreate this as !DOCTYPE html, open bracket, html, and just to kind of remember to do things, I'm going to tend to get ahead of myself and sort of start and finish the thought and then dive in inside. Let me go ahead and do head here, close head tag here, and I'm indenting, just for good measure, one, two, three, four tabs, though so long as you're consistent the browser will be perfectly content, as will we. hello, title, title, open bracket, open bracket body, closed bracket body, and then hello, body. So that's it. I've just typed out the exact same thing as before. Let me go ahead and save this as not hello.txt or certainly not hello.c but hello.html by convention. I'm going to hit Save. Mac OS is kind of warning me that this is text, not something called HTML, but I know what I'm doing and I'm going to say use HTML, and now I have a file called hello.html, and if I go to my desktop, here in fact it is. And if I double click on it, there, in fact, is that pretty simple web page and if I actually reveal the tab, there it is. Hello, title in the very top tab of the page and once I get rid of that do I see the body again. So that's it for HTML at least in terms of its basic structure, but there are some other features that we can take advantage of as well, and let's actually tease these apart. Notice, first of all, that there is indeed this symmetry. What is opened is almost always closed as well in the opposite order. Just as head here and title here, and then followed by body and then the contents therein, but because there is this structure, you can actually think about this in a relation to the past couple of weeks when we've talked about data structures. I would argue that this HTML on the left is kind of equivalent to this tree on the right, and we didn't spend a huge amount of time talking about trees, and even when we did we used them for algorithmic reasons like a binary search tree to search data pretty efficiently, but if you think about it, here is the document, which I'm just drawing with this shape here kind of arbitrarily and it has one child like the entire page as I'm drawing it, which is the HTML tag here. The HTML tag has two children, so to speak, to borrow our language from our data structures. So head and body from left to right. Head has a child called title and then title has a child of some sort, even though it's just raw text. It's not another tag with angled brackets, just as body has its own content there, just hello, body. So that hierarchy and the deliberate indentation, which is there just for us humans-- the browser does not care about whitespace-- lends itself to an implementation in memory, and so long story short, when your browser receives an envelope, inside of which are not just those HTTP headers, outside of which are not just the IP address and TCP port, but inside of which is a text file containing HTML like that, all the browser does is load that file into memory, read it top to bottom, left to right and essentially build a tree structure in memory so that it knows how to represent it underneath the hood, so to speak. And in fact, you've seen HTML all around you even if you've just never looked underneath the hood, as we say. In fact, if I go to like harvard.edu and let the redirects happen in the usual way, let me go ahead and inspect the page. This is another way in Chrome and in other browsers to get at the developer tools. You can control click or right click on the web page and choose Inspect. That opens up the same tab. Previously, we used the network panel, but if I click on Elements you can actually see all of the HTML that composes Harvard's page, and it looks beautiful here. It's nicely color-coded. It's prettily indented. I can dive in deeper with all of these arrows, but that's probably not how the humans made it because if I also right click or control click and choose View Page Source, and you can do this in any browser as well, here is the mess that actually came back from Harvard's server. This is HTML and my god, like, it's a lot. I see no indentation, so style 0 here, but that's OK because it's a browser reading it. It's not a human in this case and similarly, if we visit something like yale.edu, and let's go ahead and open up their page source, it's similarly going to be kind of overwhelming and a lot of it, but rest assured that even though these web pages might look really, really sophisticated-- like, my god, we've never written a C program with 500 plus lines of code-- a lot of this stuff is generated, and in fact, one of the challenges of pset7 and pset8 when we explore web programming is going to be not to write hundreds of lines of HTML, which would just get mind numbing quickly, but to write a few lines of Python or a few lines of JavaScript that programmatically, like with loops, generates all of the structure of your web page. So if it's like a web page of photos like a Facebook photo album, Facebook doesn't have people writing out thousands of lines of HTML code every time you upload a photo. They have code in PHP or some other language that has a for loop that iterates over all of the photos you've uploaded and spits out the same HTML but different image for each of the photos you've uploaded, and that's where web programming comes into play. You're not writing the HTML, you're generating it by actually writing programs. So today we set the stage for that capability but first we just need a framework for actually doing this. So rather than use, now, my local Mac, which is kind of lame because I can open the web page but no one else in the world can access it, and in fact, if we do that again, you'll notice here, if I double click on hello.html and open the URL bar, it's curiously clearly not on the internet. Like, it's not http, it's not https, it's literally file://, which just means it's a file on my local computer. So none of you could reach that because of course this user jharvard on my laptop exists only on my local Mac. So fortunately we have a web-based IDE with which to put stuff on the internet, but there's a catch. The IDE itself, recall, is a web application, right? It's code that friends at Amazon wrote and that we added to that runs on a server somewhere and, as we'll see, somewhat in your browser too, but more on that when we talk about JavaScript, but CS50 IDE already has a URL like https://cs50.io or https://ide.cs50.io slash whatever your username is. So we're already using port 80 or maybe 443 for the IDE itself. So how in the world could you write web pages in the IDE and then serve them on the internet if the IDE itself is already using the standard port? Well fortunately you can write on the envelopes, when trying to access your own web pages, a hardcoded TCP port number. It doesn't have to be 80, it doesn't have to be 443. Those are just the defaults. If I want to actually visit pages in my IDE, I can just run a web server on a different port number, like 8,080 by convention or 8,081, 8,082. Just a pretty big number that odds are no one else is using on some system. So let's see this as follows. Let me go ahead and in the IDE here create a new file. I'm going to call it hello.html and I'm just going to go into that text file, whoops, which I closed. Let me go ahead and just grab the code that we've been using here, which is right here, go back to the IDE, paste it into the text file here, click Save, and now I have in the IDE a file called hello.html, and indeed if I look at the file browser and I look on the left-hand side, there, in addition to the sample code, is hello.html, but if I double click this file it's not very useful because it's going to open the editor, which is not like a web page. It's the source code for my web page. So I actually now need to run a program that serves this file just like Facebook does, just like Google and Harvard and Yale do, and I'm going to do this literally by running http-server, and I'm going to say on port 8080. So -p in this particular program means port and I'm just going to say, hey CS50 IDE, start a program called httpserver whose purpose in life is to listen for requests on the internet, but specifically on that port number, and serve up whatever requests come in. So I've gone ahead and hit Enter here. Starting up httpserver. It tells me the long URL that this is available at. Your URL will be a little different with your username and if I open this now in another tab, it's a little cryptic at first glance. I'm just seeing the index or contents of my directory and in there is like a secret .c9 for Cloud9 directory. Don't delete that or change that. That just has metadata related to the IDE. Source6 I downloaded earlier and you can too from the course's web site, but there's hello.html, and on the left-hand side here, you'll see some cryptic looking permissions. This has to do with who can read and who can write your files, but for today all I care about is that the file exists. So now, like a user on the internet, I'm going to go to here, click on it, and viola! There is my actual web page. So notice, the URLs are very similar. Here I am on cs50.io and here I am on cs50.io even though your user names will of course be different, but the IDE is running on the default port, 443. I'm now temporarily serving up my HTML files using port 8080 just because and so that's how a server can do multiple things and how you can do multiple things on the server at once. So let's do something else besides that. Let me actually introduce a few other fundamentals that might be handy when writing HTML and let's go ahead and do this. Let me go ahead and create a new file and we'll call this one paragraphs.html, and let me go ahead and just name this like paragraphs and down here I'm going to have some paragraphs of text, and I don't really know what I want to say so I'm going to Google some-- so standard Latin-like text. Oh, I want like three paragraphs of Latin-like text and so here we go. Then there's a random website that just generates placeholder text in faux Latin. So, Paste. There are my three paragraphs. I'll be a little nice and tidy and indent them so it looks at least somewhat nicely styled. Save the file and now let me go back to the URL I was at a moment ago. Now notice I have two files being served by this HTTP server program. Click paragraph-- oh. OK, one, Chrome thinks the page is in Latin. [STUDENTS LAUGH] Actually, soccer inferior element estate planning time. Tomorrow soss quiver before as the-- that does sound like the Latin I learned years ago. All right, so Show Original. So the point is not to focus on the Latin, but the apparent bug. Like, what's it not doing that maybe you thought it should a second ago? AUDIENCE: No indentation. DAVID J. MALAN: Yeah, there's no indentation and also there's no what? There's no break. I mean this is one big Latin-like paragraph. It's not three. Well this is simply because a browser only does what you tell it to do. Let me go ahead and shrink this window and, as an aside, what you're seeing here, all this mess in the bottom terminal window, as the httpserver program is running, it is logging all of the HTTP requests that come in from browsers just so you can kind of debug or diagnose, but we're going to just ignore that for now and let this thing run down here in the background. But if I want paragraphs I need to be a little more pedantic and actually say, hey browser, make a paragraph with what's called the p tag, and let me go ahead now and indent even though the indentation clearly doesn't matter. It's just to keep my code nice and tidy. So, hey browser, start a paragraph. Here's the text. Hey browser, stop the paragraph. Same thing here. Let me go ahead and start a paragraph. Then let me go ahead and stop the paragraph. Notice the IDE is trying to be helpful. This is not helpful. This is not a password, but it's trying to autocomplete my thoughts. That's fine. I'm just going to ignore it. Then let me go ahead and close the paragraph and save. So it's a little more verbose, but anything in the tags the human is not going to see, but when you reload the page, as with command or control+R, or if you go up here by clicking the reload icon, whatever it looks like in your browser. Now I have three Latin-like paragraphs. So it's a little more deliberate here. So that's all fine and good, but the web is kind of more interesting when you can actually link to things. So let's actually do that instead. Let me go ahead and create a new file called, let's say, link.html. Go ahead and paste this here and say we'll name the title link. Let me get rid of all of this just so I have some placeholder and I can say something like "Hello, world! My favorite school is..." and just to play it safe today, "stanford.edu." Save, reload, click link.html and nothing. So here too it looks like a domain name and it certainly is, and frankly, all of us now are probably conditioned in tools like Slack and Gmail and other tools and Facebook that just kind of figure out that, oh, if something looks like a domain name, make it a link, but that's because someone at Facebook, someone at Google knows HTML and knows how to use if conditions and elses and just says, oh, if a string that the human has typed in looks like a domain name ending in .edu, make it a link. But how do you make it a link? We can now do this manually. It turns out you need an anchor tag abbreviated as a and then I'm going to close the anchor tag at the end of the text that I want to anchor a link to, but this isn't enough. I need to be ever so explicit as to where I want this link to go, and so it turns out HTML also supports what are called attributes. So tags are the things in angled brackets. Attributes are also inside those angled brackets, but they come after the tag's name, and they just going to modify the behavior of the tag, and it makes sense here to need to modify the behavior because 20, 30 years ago when HTML was invented, we didn't make up a tag that leads to stanford.edu. We made up a more generic tag that anchors to some destination, and so here I can now do www.stanford.edu, save the file, and notice, this is like saying to the browser, hey browser, here comes a link or hyperlink to Stanford's web site, and then the end here it says hey browser, that's it for the link, and thankfully it's not super verbose. You don't have to repeat the attribute at the end. You just repeat the tag's name, otherwise you'd be typing the same thing again and again. If I now go back here and reload the page as with command or control+R, now it becomes the familiar and blue underlined link, and if I click on that, notice first it's super small. You can see where the link is actually going to lead, and so if I click on this we'll see Stanford's website and voila. So now we've visited their page as well, but there's an interesting side note here, and if you want to kind of think about things called phishing attacks or frankly, Harvard once in a while and Yale once in awhile will email out warnings like "beware of this phishing attack." P-H-I-S-H-I-N-G. This is when people on the internet generally send you emails or some kind of spam trying to trick you into visiting a phony website to harvest your usernames, passwords, credit card numbers and whatnot, and honestly, most of those phishing attacks boil down to this 10-line example of HTML because what's to stop me from saying something like "Hello, world! Confirm your password at..." and then we'll say like paypal.com and then over here, I can change this to like davidsphishingsite.com, which hopefully doesn't exist. One year I went to badplace.com and-- anyhow, so-- [STUDENTS LAUGHING] Here I've gone ahead and saved the file, reloaded, and the link is indeed blue, but before I click on it, only the most estute of users is going to even bother checking the bottom left hand corner to see where they're about to be whisked away to and even most of us in this room, myself included, are not so paranoid that we're constantly checking those kinds of things. Odds are, if I get an email like this, oh my god, my accounts been compromised. I've got to go confirm my password for PayPal to protect my money. You might very well just follow the link, but of course it can go anywhere you want just via this very basic building block, but this is just one way you can vet actually what's going on underneath the hood, but of course the internet is more interesting than just text alone. Let me go ahead and open up an example that I whipped up in advance here using image.html and we'll see another tag here. So here is another opportunity to use an attribute and one that's also not necessarily visible to the user. So here's an image tag. Humans years ago decided to be succint. It's img > for image, just like it's just a > for anchor. The source, src, of which is going to be that file, dan.jpeg, which I downloaded in advance from the URL up above, and in fact, this is gray in the cs50 IDE because it's syntax highlighting it just like in C. This is what's a comment in HTML. So if you want to make notes to yourself or to viewers, some sentence or like a citation like this, you can use an HTML comment by doing ! // // > and you can write anything between those things-- for the most part-- that you want. So just like in C do we have the //. So here's the source of this image and this is like an alternative explanation of it, alt. Why might this be compelling? I want to show the image to a user. Yeah? AUDIENCE: Is it for like if they hover their mouse over it, they can see what's happening. DAVID J. MALAN: Yeah, so a couple of reasons. If you hover over the image you can actually see some descriptive text. So like Handsome Dan here, like Yale's mascot. If the user has trouble seeing or is blind, you might need a screen reader to actually tell you what it is that's on the screen, and it's not obvious from dan.jpeg what that could be, but if you have this alternative text, a computer can recite verbally Handsome Dan, which might then jog the person's memory as to what it is that actually on the screen. Or if you have a really slow internet connection, sometimes you'll see a placeholder for an image that just says what it is before the image actually downloads. So being mindful of these kinds of things will just make, ultimately, your websites more accessible, and indeed if I go to this one now and go into my source6 directory where we have even more examples at our disposal and go to Image 6, here is their adorable Handsome Dan as of this past year. So there's an image. We can kind of do funky things now with nesting. So this is not all that interesting because it doesn't go anywhere, but I could just combine these ideas. I could do a href = http://www.yale.edu or, because I don't want the user to bother getting redirected, I could just proactively make it secure because I know Yale supports that per earlier, and I can nest these tags like this. Now if I go here, reload, it still looks the same but notice my cursor changes to like a pointer, and if indeed I click on that, now the image leads to Yale's web site, but I skimmed over something. One of these is not like the other. What detail have I kind of not mentioned? Yeah. AUDIENCE: The image file closes within itself. DAVID J. MALAN: Yeah, the image tag kind of closes in and of itself, and so there are some of these anomalies within HTML where there really isn't a notion of, like, start doing something and then eventually stop doing something. Like, an image is either there or it's not. Like, you can't kind of put something in between it conceptually, and so some of these tags in HTML are what are called empty. Like, they should not have anything after the open tag or before the close tag. So if you wanted to be really sort of precise you could say this, but you should not put anything where my cursor now is because it would make no sense to try to put something inside of an image, but this is just kind of lame to have this unnecessary verboseness. So you can just put the slash in there and technically in HTML5 you don't even need the slash in this case, but at least this way, and I think for pedagogical purposes, doing it, even for empty tags, makes sure and makes more clear visually, when and that your tags are balanced. So that's the only anomaly there and then there's bunches of others which we can fly through really quickly here. So if I go back to our examples here, I whipped up headings.html. So if you want to do something like this if you're writing like a book or a website that has like chapters and sections and subsections and so forth, HTML lets you easily format things as big and bold, slightly smaller and bold, slightly smaller and bold, and so forth by using the h1 through h6 tags. So if I go into headings, this is how I made this web page. I simply have h1, h2, h3, h4 opened and closed and that's it. So any time you're reading some kind of online text, odds are they're using one or more of these tags to format the page. If we look at another example in here, we have something like list.html. Lists are not uncommon on the internet, you'll never believe number three, and here's how you might do something with a bulleted list by just marking up three words-- foo, bar and baz-- and the HTML for this, if I open up list.html, simply looks a little more verbose in that we need a parent element so to speak, borrowing our tree terminology, but here we have an unordered list, or ul, each of which has one or more list items, or li, each of which open and close foo, bar and baz. And if I really want it numbered, I can also do this. I can change unordered list to ordered list, ol, reload and now the browser figures out the numbering for me, which is nice if you have lots of data and you don't want to deal with actually laying it out yourself. Meanwhile, we can go one or two steps further before we actually get to something functional. Here is kind of the most complicated of all, but it too just kind of tells the browser what to do. So before we look at the result, this says, hey browser, here comes a table, like tabular data. Rows and columns like Excel or Google Spreadsheets. Hey browser, here comes a table row, or tr. Hey browser, within that row, here comes some table data, a.k.a. a cell or column. Here comes another cell. Here comes another cell. So that's one, two, three cells in a row. Hey browser, here comes three more cells. Hey browser, here comes three more cells. Hey browser, here comes three more cells and if we actually render this in the browser, you can see the layout of a sort of old school phone pad on your phone. It's not very pretty, it's not very well formatted, but if we zoom in you really do see that it is lined up in rows and columns as I sort of verbally implied, but this is all very kind of underwhelming. Like, Google is cool because you can go to it and you can actually search for cats and find lots of cats on the internet, but how is it that this actually works? So, aww, bad news today. OK, so we'll just zoom in on this one. OK, so let's try to focus on the pedagogy here-- of cats-- as follows. Let me go ahead and focus on really the URL, which is kind of long and cryptic, but let me just throw away honestly anything that kind of looks confusing or I don't understand. I have no idea what source means so I'm going to get rid of that. I have no idea what the rest of this means. I'm going to get rid of that and I'm going to try to distill-- granted, with some foresight because I knew how Google works here-- I changed the URL to something much, much, much simpler. Cats,f where it's www.google.com/search?q=cats. It seems that, somehow or other, Google's behavior is controlled by information that's conveyed in the URL, and it's not just that I'm searching. It's that I'm searching for cats. So in fact, on a whim, I'm going to search for dogs instead and hit Enter, and indeed a few things change. We have all these dog images appear here on the right. We have the text pre-populated up here and we can search for any number of other things here, like Harvard Yale prank 2004, Enter, and there you have a Wikipedia article on the video we saw earlier. So it seems that you can parameterize the behavior of Google just by understanding how this URL works. So here is kind of the path that's being requested, the file or folder or whatever that is. A question mark says, hey browser, or hey server, rather, here come some HTTP parameters. Some inputs from a human who's either filled out a form or apparently is kind of hacking the URL bar here, and then the name of the parameter comes next. q, meaning query, and this is what Larry and Sergey decided years ago for their search box, an equals sign, and then whatever it is the human typed in. Now it got a little funky here quickly. Now you see %20. That is the web's way of encoding a space so that it's not a physical space, it's all one contiguous string. So it's just one contiguous string for the server to actually look at or read, and so why is this useful? Well it turns out I can leverage this information and kind of implement my own Google pretty easily. Let me go ahead and go into search.html, one of the other examples I whipped up, and you'll see another tag all together. Inside of the body of this page is an HTML form tag, and the form tag takes a couple of attributes I know. One is action, which is the URL to which you want to send the form's information, and the other is the method that you want to use. Now it's a little inconsistently lowercased here just because, but we did see that verb before. Where? Where did we see this verb? This was like the somewhat arcane message that was going, supposedly, inside one of these envelopes when we said GET in all caps /http1.1 and so forth. So it seems that if you want, as the web developer, to create an HTML form that has text boxes and maybe checkboxes and dropdown menus and so forth that submits its information when the user clicks Enter or a button to this address, and you want it to go inside of a virtual envelope using that GET verb, you literally just say method=GET. And then down here I seem to have two inputs, one of whose names is q, the type of which is a text box, and the other of which is a submit type, whatever that is, the value of which is search. Now you would only know what these things mean by seeing them demoed or looking at some online reference, but if we pull this up to see the results we have a super simple-- and I'll zoom in-- very, very simple version of Google, right? It don't even have the logo, but it does have, I claim, all of the functionality because watch what happens if I type in, for instance, whoops, birds and click Search. Oh my god, I implemented Google with just like 15 lines of code, but not really, right? Like, I've implemented the front end of Google, which I got to start Googling these things in advance OK, uh, these are very sad stories. [STUDENTS LAUGH AT MORBID NEWS HEADLINES] DAVID J. MALAN: OK, so the point though is, the point-- look up, look up. The point is that the URL is what I generated. So using those HTML tags coupled with the human's cooperation and actually clicking a button did I then generate this URL, whisk the user away from the IDE to google.com, where Google is handling the back end, like all of the hard work, actually checking their database, rendering the HTML, but I made the front end, the user interface via which you can actually interact with Google's search engine there. And it boils down to just these basic heuristics, but of course this is a pretty ugly search engine, right? Black and white text box, a gray button and that's it. Like, even Google, simple though it is, has a little bit of style and color to it and things are centered and kind of spaced differently. So there's an art to this ultimately and indeed being a web designer in itself is a profession and in fact, you'll find in industry that some people are good at front end design. Some people are bad at it. I'm among the ones worse. Like, my web pages look like that search box just a moment ago, but some people really prefer the non-graphical stuff, the back-end, the database stuff, and indeed one of the takeaways over the next few weeks will be for you to figure out for yourselves if you like any of this at all certainly, but also like what your preferences are. And you might hear terms in industry these days like front-end developer, back-end developer. That just means do you work on what the user sees in their browser or app or do you work on the back-end, the database stuff that's really important and sometimes quite difficult, but that the user doesn't interact with directly. Or are you a full-stack developer, which means you just do all of this, which all of you from CS50 are effectively, albeit after just one or so semesters of background. So how do we start, though, to make things prettier? Well it turns out that HTML, for the most part, is just a markup language. It's for structuring a web page and semantically tagging things, and by semantically tagging things I mean like, hey browser, here's the head of my page and that's a concept, semantically. Hey browser, here's the body of my page, and that too is a concept, semantically. I didn't say anything about bold facing or font size or colors or all this stuff that's important for a good user experience, or UX, but that can be decoupled from HTML, and in fact, one of the challenges as you learn HTML for the first time is to try to make your way through various online resources and references will sometimes combine these ideas. So, again, today we'll focus not just on correctness, getting things to work, but design as well. So here, for instance, is a super simple web page for someone named John Harvard that has a header and a main part and a footer, and header is distinct from head. It's sort of poorly named here. Head of the web page is just the tab bar and other such things up top, but semantically you might have a page with like three parts. Like the header, like the title on the body of the page itself, like the main part where the actual contents are, and then a footer like a copyright symbol or something like that. So this might be a general division of a page, but notice I've styled it a little differently. Let me go ahead and open this up in a browser as I did just a moment ago and go to, sorry, I'm going back through my entire internet history here. Let's go ahead and open this up just as we did before at this URL so that we can go ahead and open up CSS0.html. Notice that, oh, this is already marginally better than the pages we've looked at before if only because it's centered, which is a step forward from everything just being left. The first line is a little bigger. The second line is kind of medium and the bottom line is the smallest. So there's a little bit of style here, but not all that much. So how did I actually do this? Well take a look at the code here. I have added, now, a style attribute to several of my tags. So the header, the main and the footer really aren't styled in any specific way. They're just a way of telling the browser this is the important stuff for the title, this is the important stuff for the main part, this is the important stuff for the footer, but the stylization or aesthetics come from this yellow text here, thanks to the IDE syntax highlighting it, and notice this text follows a different pattern. Up until now, we've been using angled brackets and words and equals signs and quotes. Now, inside of those quotes, we also have another pattern when you're using this second of two languages today, CSS. fontsize:large is the stylization for this particular element's content. Text align should be center. These are two CSS properties. CSS, cascading style sheets, and we'll see what that means in a moment, but this is just how you configure the style of those elements, and indeed that's why one is a little bigger and then a little smaller and then even smaller because, notice, I did fontsize:large, fontsize:medium, fontsize:small. All right, but as we've often done, let's iteratively improve upon this. Even if you've never seen HTML or CSS before, there's some poor design manifest in this simple example. What might you say seems wrong or seems a little copy paste-like? Yeah. AUDIENCE: They're all centered [INAUDIBLE].. DAVID J. MALAN: Yeah, they're all centered and I literally like copied and pasted that CSS property, its key value pair, its name and value, again and again and again, but remember the hierarchy of HTML and the DOM, Document Object Model, the tree we drew a little bit ago. All of these elements-- header, main, and footer-- have a parent element called what? AUDIENCE: Body. DAVID J. MALAN: Yeah, body. So one level higher, which is indented this way or in the tree is higher up in that family tree-like drawing, all of these are children of body. So why don't I just move or factor out text align center into the elements above it? And herein lies the cascading of CSS. Cascading style sheets means that if you have a property up here, it will cascade down to all of the children and descendants below it and it means another thing, too. You can even override these properties somehow, but we'll see that before long. So if I go ahead now and open up CSS1.html, notice that I did exactly that improvement. The code's a little tighter now. It's fewer characters, easier to maintain because now if I want to change it to left or right or center, I change it one place, not three. And so this is kind of consistent with some of our design takeaways from C and indeed, if I visit this page, CSS1.html, it looks the same, but it's better design underneath the hood. But we can do a little better still. If I open up CSS2.html, notice that I've done this. I rather like this design now because it's even more succinct. I'm not using the style attribute anymore. I'm using a different attribute called class, and class is kind of a way to define-- much like a struct in C lets you define your own data types, a class in CSS allows you to define a name for a whole bunch of properties, and so here I just said let's call this class large, medium, and small, and I don't know what those mean, and frankly I might be working with a friend who's much better at design than I am so I'm going to let him or her actually define these meanings. I'm just going to kind of tag things in this way semantically, but if we scroll up in this file, you'll see that for now I have no such friend, and so I implemented it myself, and here's, for the first time, one other thing in the head of the page. Up until now, we've just had the title, but it turns out you can have a style tag. Not just an attribute, but a style tag inside of which, it's a little cryptic at first glance, but there's some pattern here, clearly. You have all of those properties, but the new syntax here is that if you want to define a word called centered, you literally do a period and then the word centered. If you want a word like large, you say .large. So it's similar in spirit, though not quite the same as like typedef in C, but you say .center, .large, .medium, .small. You use our old friends curly braces, which we will only see in CSS, and this just defines one or more properties to be associated with that new keyword. And so, if we scroll down here to the bottom, you'll see that I centered the body. I made large the head, medium the main, and small the footer, and the result is going to be exactly the same. Very underwhelming, but again, marginally better design because now we are just one step away of really improving this. If I do finally have that friend, it's not going to be very easy to collaborate, ultimately, if we're both working on the same file and moreover, it seems unnecessary to introduce these semantics. Like, why do I have to have tags like header and main and footer and classes called large and medium and small and centered? Like, why don't I leverage the names of these tags themselves? And this is where HTML can be pretty powerful. Notice I've simplified some of my CSS up top. I've dropped the period, which was like typedef. Like, give me something called large, give me something called medium. Now I'm just saying literally a word, but those words are identical to what? AUDIENCE: The tags. DAVID J. MALAN: The tags themselves. So preexisting tags, if I just mention them by name without a period, which gives me a new name-- I just mention the body, the header, the main and footer, and then, inside of the curly braces, define my properties, now I can just stylize the actual tags as they exist in my page, and this now looks like really readable, maintainable HTML. There is no aesthetics associated with the markup language here, but rather there's useful tag names that come with HTML-- you can't just make up your own tags. They're in, sort of, the documentation, but now it's just much more readable, and this might look different on my phone or your phone or your laptop, but my friend who's good at stylization can figure out how to style all of these things, and better yet, he or she doesn't even need my file. In the fifth example here, notice that's it for the page. We've gotten rid of the big style tag and replaced it apparently with what? AUDIENCE: Href, a link? DAVID J. MALAN: Yeah, link href, which is a horrible, horrible name because it's not like a link in the page and hyperreference was already used for a link in a page, but this is what we're stuck with. This just says, hey browser, include this CSS file that is elsewhere on the server. The name of this file is arbitrarily CSS4.css because this is our fifth example here-- zero index. The relationship of this file to this page is that it's a style sheet, which is just a list of aesthetics or properties that should characterize its layout and indeed, if I open up CSS4.css, I just copied and pasted everything in there, but this is nice now in principle, even though we're just creating work for ourselves today, because now I can share this file with someone else. He or she can work on it on their own. Then we can merge our work together because my work's in the HTML file. Their work's in the CSS file. Better still, if we're making a whole website that has a dozen pages or 100 pages, consider this. Just like in a C header file, I can include bitmap.h in all sorts of programs. Similarly can I include CS4.css in all of my web pages. So if I want to change the font size or the layout or whatever in all of my website all at once, I change in one place, not in every darn web page that might have been created by me or by someone else, and so there's just that maintainability to it too, but we can do even better than that because even the CSS we're looking at here is only so good, and what's really nice is if we go to bootstrap-- let Google tell me where to go. We're safe. OK, so Bootstrap is a library-- formerly from Twitter, now a much larger community-- that's a whole bunch of CSS libraries. So just as in C, we have code and functions that other people wrote. So in the world of web development do we have code that other people wrote and we use that for JavaScript and Python, but even for aesthetics are there sites like Bootstrap and other popular things that allow us to make our sites prettier and build them more quickly without having to reinvent wheels. So for instance, if I go down to let's say Content and I go to Typography and skim through here, you'll indeed see like h1, h2 and h3, but if you want things even bigger than that there's like a display heading. There's this fancy version, which has a fancy display heading with some faded secondary text. So pretty marginal, but I don't have to figure out how to do that now myself. If I want to actually have tables, I can do much prettier tables than I did with my little old school phone pad a moment ago. Like I can make things different colors. I can shade the columns like this and in fact, you can do even fancier things. If I go ahead and open up a web page and go to our big board for speller.cs50.net, you'll see that this is a pretty good looking table as tables go. Certainly much better than the one before, but that's because we're using the Bootstrap library, and even more compelling than the aesthetics are that suppose that you visit speller.cs50.net on your phone, it starts to get pretty ugly once your window gets smaller, but notice stuff can just disappear magically when you're on a mobile device or, in this case, simulating it by using just a smaller browser window. So using CSS and the aesthetic power that it provides, we can also dynamically change our files to just render differently on different devices, and then lastly, let me open up, for instance, this under Components. This is where the really juicy stuff is. If you want fancy alerts to yell at the user or say everything is OK, you get nice little colored boxes like this. The forms are much prettier. I mean, already this looks much more like the web you and I use and not the mess of a form that I created a moment ago and long story short, just like in C it's pretty easy to include these things in your own site, so can I do this. Let me go ahead and open up form0.html, and this is literally an approximation of the very first web application I made, even before web application was a phrase, in 1997. I had taken CS50 and CS51. I hadn't learned web stuff at the time. I just kind of taught it to myself and learned from some friends and the first thing I did was build an interactive website via which first years could register for intramural sports because literally that year in 1996 it was paper-based. You'd walk across the yard, open up Wigglesworth, one of the dorms, slide a piece of paper-- old school-- under the door and you were registered for a sport. We could do better even in 1997, and so we did it with the web, and so this form0 back in the day looked a little something ugly like this, but there's a text box where you could type in your name and then there's the dorm where you could select Matthew. So I could actually do David Malan and Matthews and then click Register, but we don't yet have the ability to make backbends yet. So this form goes nowhere for today, but you at least get these kinds of aesthetics, which are kind of 1997 aesthetics, literally. But if we go into this other example, form1.html, it looks pretty, pretty better now. It's maybe a little big in retrospect, looking at the display font, but all I've done is now use this Bootstrap library, and notice, it's a little hard to see on the projector here, but everything's kind of like nicely outlined. There's like Mark Zuckerberg sample text there which we can override by actually typing in our own email address here. We have a prettier looking box, a prettier looking button, and that's just because if we open up, as down here, form1.html, notice that in addition to my HTML down below and in addition to a couple of other things that I've added to make things more mobile-friendly in particular, I just added this. I read the documentation on getbootstrap.com and I went ahead and added Bootstrap's library to my own code in order to have access to its actual features, and then down here, it's a little overwhelming at first glance, but I just followed the directions. There's something called div in HTML for a division of the page. It means give me this invisible rectangular region. The class I associated with it is called form group. I didn't make this word up. This comes from Bootstrap. I just did what they told me to do. I then have a label, which makes things more accessible and you can click in different places. I have another class here but long story short, I just read the documentation because I know what tags are, I know what attributes are. I know a little bit of CSS now and I know how HTTP works, and so really I have enough building blocks in order to work on this myself. So that then is CSS and there's one last detail I thought I'd show us here. In all of these John Harvard examples, as in just a moment ago, we had something like this at the very bottom. This {} ampersand #169;. What was that rendering as, if you notice, in the web page? AUDIENCE: Copyright. DAVID J. MALAN: Yeah, the copyright symbol. There is, on my US keyboard, no copyright symbol. So you need kind of a pattern of characters with which to represent those in HTML. So just like we have /n and other special escape characters in C, you have what are called HTML entities in HTML that you would only know from reading the documentation, but that's the copyright symbol, but I thought it was rather timely to point that out because just yesterday or this morning, Apple announced that with the very new version of iOS that you can soon download, they added even more damn Emojis to the Emoji character set. So these are certainly in vogue these days and not only do we see, now, a way to represent special characters that you couldn't otherwise type using HTML, it turns out all this time that Emojis are actually just characters, chars, but they're not 8 bits. Recall that C as we've been using it uses ASCII, which uses only 7 or 8 bits total and Emojis, my god. There's so many of them right now and we need more than 8 bits to represent them, and thus was born something called Unicode. Well, that is not why Unicode was invented, but this is what Unicode is now being used for because these emojis are simply like ASCII characters but multiple bytes, generally two bytes, maybe three bytes, and in fact, if you go on unicode.org, you can see that if the number in hex 1F600 represents the grinning face, which happens to be implemented differently by different companies on different devices, but if in closing here, I open up this same file and I change this to 1F600 in hex, 1-F-6-0-0, save, and I go back to my browser and I go back to CSS0, now we have a very happy web page for you. So that's it for today. I'll stick around for questions and we'll see you next time.
Info
Channel: CS50
Views: 110,440
Rating: 4.9084725 out of 5
Keywords: cs50, harvard, computer, science, david, j., malan
Id: PUPDGbnpSjw
Channel Id: undefined
Length: 100min 50sec (6050 seconds)
Published: Sat Oct 07 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.