TCP/IP turns 40 | The Backend Engineering Show

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
the tcp ip protocol turns 40 this month september 2021 it was released on september 1981 almost my age i'm 38 but man i just want to make uh this video to kind of pay tribute you know you do attributes for people who die right no show my respect that's for people who's dead too no uh to show my appreciation for this uh pretty and beautiful and sexy protocol and show a little bit how it works you know when you read about this protocol you read this junk junk you know everybody's talking about junk you know it's like about theory like oh ips routing and addressing and tcps for retransmission no one cares talk about actual what is happening on the internet let's jump about it and talk about this stuff welcome to the backend engineering show with your host jose nassar and tcpip is the protocol that has been built to standardize communication between two computers it was first built by uh for the department of defense in america um i think so called arpa i don't know the history and frankly i don't really care you know some okay we needed networking we built it right i just interested in how it was built okay it was built the history of it fine we appreciate it okay but i i find myself falling asleep every time when someone talks about arpa i don't know if it was just me right i was like yeah i was like okay we get it it's old right but but now i want to talk about the layers right just just the ip and then tcp just just these two right maybe we'll mention t udp but before we talk about the iep protocols we have two computers and each of them have a network card and the network curve have an identifier that is uniquely global called the mac address right why don't we use this for the global communication it will be unique right let's just use that right because that's the that's what everybody thinks about like okay since i have a unique address and we have this idea of frames at layer 2 the data of layer as the osi model calls it why don't we just communicate with mac addresses i'm willing to type in my mac address on the browser that's probably a bad idea because you can't remember that but yeah let have dns point to the mac address and then it's the same thing why do we need this ip thing right let's think about it right the mac address are truly random they are just random they do they don't have any knowledge about a network or anything like that so imagine uh a group of networks or computers connected to each other and they they have all these wires crossed to the switch or whatever right and they're connected let's say you want to send the message from mac address a to mac address b what do you do you have to send it to everybody there's no way there is no way that tells you that this back address that oh let's take a shortcut i know that this these mac addresses are only in this area let's just take this shortcut right uh but but this mac address is actually not in this group like so they only send it only broadcasted here it's impossible to scale in in billions and billions of devices you're going to scan the whole internet that's dead on arrival right so that's the problem with mac addresses they are not routable right you have to send it to everybody and whoever gets it and they check that it's actually theirs then they accept the packet i know bridges and switches kind of have smart capability where they say okay if a mac address if i send if a mac address is if a source mac address send me something from this particular port then it must exist right then i keep track in my table that says okay mac address a is actually on this port so it does exist so if you do send mac addresses to the switch right then it knows oh actually is on on this side of the bridge or the switch right so you don't really need to spam every port that you have with all that stuff but how would that scale on the internet it's impossible you plug in devices all yeah this might work on an enterprise network you know where you have a few thousands computer maybe but in the internet it will not work so people needed a way to differentiate a sub network from the hosts right so the idea of having the whole network and then create sub networks and then find a way that if i send you the address whatever we're gonna call it some name now find a way to cut the eliminate where we're gonna send it next find a way to shortcut this instead of scanning the 100 networks only send it to two or one that is the trick here just like a database right when you have a database and you have large table right with 100 million rows the shortest path to scan and find what you're looking for is to not scan 100 million rows you do anything that you can to avoid scanning 100 million rows that includes partitioning the table that includes indexing the table and and only going to the things that you absolutely sure that this is the page that contains that row i'm looking for very similar with the concept of routing here that's why the ip protocol the internet pro code was invented and as a result you now have ip addresses with four four octets therefore bytes right four bytes so you have around four billion uh addresses that you can address and they thought back in 1981 that was enough and uh to be honest with nat i still think it's okay right but i know ipv6 i don't know much about networking to be honest i know i know this basic things that i can comfort comfortably talk about but ibv6 i don't see it as popular as as it claims to be might be wrong though let me know in the comment section if you think otherwise but yeah i don't think i see it picking up i most of ninety percent of the time i work with ibv4 right despite us being in 2021. so now the ip address contains two portions whether you like it or not the network side what network do you belong in this is also determined by something called the subnet mask right that's why the subnet mask is very critical to be shipped with the ip address you can't really determine the the network without the mask you need you need to know the mask with it right so now when you do you apply your mask you know that oh this is the network and the network i can go the details with that but in in general you can find out that this is the network right and the network can have a lot of machines on host but it gives you a general idea oh it's oh this network is actually on this side oh this network is actually on this side right so just like that instead of scanning individual hosts like the mac address now you're scanning networks and and you can sure the network can go grow big and large and you can have multiple sub networks until you find eventually the destination that's the shortcut that's the routing idea right so now the iep protocol which is this layer three thing always have two pieces that are critical to us at least engineers right network injuries might look at the ip head and have more information that they are interested in i'm interested in two where is this packet coming from and where is this packet going to and this is just the information it has source ip destination ip where are you coming from so it's ip this this this right where are you going to this ip address and then based on that knowledge each router will receive this ip packet and we'll look at the destination addresses oh you're going to this particular ip address which which sub network it is let me apply the subnet mask so instead of having x amount of options and ports to send it to oh this network is actually just in the sport send it there right not just send it there the router makes a lot of decisions here it makes says okay yeah i can get to this host through this particle network and also from my history i also got it i got to it from this particular network so there are multiple networks and there is cost associated with it so it cost me x amount of this cost right to go through this versus uh i don't know x over two right so one cost like for example five versus two of course you're gonna pick the shortest not all the time but you can pick the shortest path you send it this way just like that by saying that the router makes this decision and there's multiple paths to the same destination all routes lead to rom eventually the ip packets will be delivered into different routes even if they are going to the same destination right because you're going to the same destination at this moment of time port a is available but after a few milliseconds port b is actually faster because port a goes through a part a path that is actually congested and we're losing packets so let's go here and we'll go here and all of a sudden now you can deliver the packets and they were going to arrive out of order they're going to arrive mumbled they might be lost the ip packets don't care the ipackers don't care that ivy protocol don't care about actually delivering and making sure it reaches it's going to tell you if it reaches that it's it's actually bad i think it has a checksum it might be wrong maybe that is udb but yeah when you actually arrive there you just arrive at the address you arrive at the machine that has this ip address that's pretty much it not much information is it this ipl address might not be even the actual final machine why it could be the router it could be your native router and and and just that like okay as a router as a public router europe presents maybe five or six or seven private land machine which one is the one actually sent me this i have no idea right that's what goes into the discussion of nat and stuff like that but but yeah regardless the ip protocol just delivers it to the machine you have no idea which application even requested that is it is it this process over there or is it uh this browser instance or this is or this brave tab who sent this packet who knows you have no clue right that is why another protocol was invented on top of that to solve code and code solve these problems give you finer controls right it's like i want to identify not just my machine is not running one application like in the 1980s yeah if you're in the 1980s you have one application on this machine then the ip packet that arrives to this machine must belong to this application no we're running many applications my friend we're running many applications and we want to uniquely identify these applications meet ports that's why the tcp transmission control protocol introduces ports right udb2 but now you have the idea of ports now yeah i have two unique identifiers i have the host which is the ip address once the ip packet reaches nope not enough you gotta tell me where are you going the operating system needs to know which port you're addressing if you run a web app you're usually going to port 80 if it's unencrypted nobody visits port 80 anymore everything is encrypted because uh that because that's how the house because that's the way it is my god and you go to port 443 right and then you deliver the ip packet into a segment like the content and then it delivers to the application eventually so now let's step back and and establish a tcp connection right the idea of a connection in in tcp is is designed so that we can guarantee the order of the packets arrive in order right because we talked about how ip have a mind of their own and the routers will just randomly randomly pick the shortest path right and as a result the iep packets might arrive out of order even if they are sent from different machines right if you're sending a get request from your fetch app which api that might translate to many many ipa packets right and they will not arrive in order even if you send them in order the internet will take different paths and then eventually they're gonna arrive out of order so the tcp connection establishes the stateful thing and and marks uh each packet with the sequence number say okay you're okay packet your sequence number one okay you're two you're three you're four you're five you're six right it's not as easy but you get the idea there is the length and stuff like that so the sequences are not in order they are offset with the size of the actual content but you get the idea so okay one two three four and the and the other destination when the tcp connection at the other end receives this content we'll start ordering them oh you're one you're two got it three uh seven oh wait a second where is five and six i'm gonna wait i'm gonna block uh let's wait let's wait let's wait up five and six didn't arrive retransmit x uh ask for a re-transmission sorry i didn't get this please resend it back it takes care of all that stuff so that is that means tcp guarantees this beautiful real ability and as a result it is slower right the slower that's why a lot of people don't like tcp because it's slower because it does have all that stuff and it has so many other features it says okay i need to kind of take the pulse on the network you know i'm going to start slow i'm going to start sending you just smaller packets right but but and then eventually i'm going to increase increase this window right so i can feel the content congestion control right this is called congestion control you don't want to send huge amount of data when the network starts because they're gonna get lost right a lot of packets means congestion means slow i mean if it's slow then the other party won't receive it they will ask for re-transmission re-transmission means slow that that is an indication that packets are being dropped and when packets are being dropped which means usually that you're sending too much data slow down son and that's how the tcp also controls uh congestion it's like okay slow down oh go open so it will just basically control that stuff there's a lot of stuff but as a result this slows things down right but now has these features so ip it just this is the the actual host going to this machine right and the tcp has also ports sequences and stuff non-dos length and many many other stuff as well but what would we engineers interested in especially a backend engineer i interested in port that's pretty much the only thing i care about tcp to be honest well maybe in the sin right request and the acts right the the initiation like oh i'm about to establish a connection that's called by the way synchronization right uh the synchronization request or cigar to synchronize this uh sequences that we talked about right because you need to agree about these sequences right of the packets that we're going to send so another thing i didn't mention is when you want to go a higher level you're on the application layer and you want to send a piece of data say i want to send a a get request or get slash http one one blah blah headers cookies so i'm on some a bunch of bites right at the end of the day this will trans be translated to tcp segments right so we're layer four now which is the tcp you start breaking that okay uh i don't know the maximum tcp segment size mss is i don't know 128 by so pop pop pop pop you start breaking up i i don't know i have five five tcp segments well each pvc segment will be added pending with the destination port where are you going we're going to port 443 where are you coming from this is the client port which is another kind of randomly generated port so that when the packet comes back the host right when we send response back from the server we know where to deliver it right so the source port is very critical as well right so the source port the same thing and this extension port that's 443 right so the distinction port is 443 the source port is just a random number and then you add these headers over and over again and this header cost of the tcp segment is around 40 bytes yes it's big so if you're sending five that's a lot of overhead and get this each tcp segment including the ports and the 40 headers is shoved into an ip packet and now layer 3 which is the ip protocol it's just it doesn't even care what's inside it all it takes this port destination port source port and the part of the get request shove it into the ip packet go your merry way you are going to this iprs and you're coming from this source ipad routers take care of routing this ip package what's inside it you don't care you should not look inside it but routers do actually it's another topic so ib you all of what routers see are just a flood of ip packages just going to the destination and and and source source address that's it it doesn't care what's inside it so they will be essentially shifted like left and right all that stuff until they reach the destination they're gonna reach out of order the other destination is gonna reorder them re-transmit as needed all right some routers actually especially my isp router right this router you have at home actually keeps track of the connections it's like oh wait a second all right you want to go you have an ipad that is private 192.168.1.3 right you can't send that out then public internet you're naked you can't go out wait i as a router i have a public ip address i'm going to net you up get it and net you up network address translation i like that so now i'm going to change your source ips from 19168.1.3 to 4.1.3 my public ip errors this is now me presenting you but where is again i'm representing all these private machines as well 192.168.1.7 and eight and nine so we need some sort of a table to map that okay you went to this destination you came from this port i'm gonna add a record uniquely identified that points back to you so that when packets come back to that source port that you told me of i'm gonna send it back to you i know exactly which host machine and i exactly which source port that you originally used right and the router at this stage actually takes track keeps track of the connection so okay you sent me a send request to establish a connection all right i know you just established a connection you're going to keep track right you better send me the next thing you you need to send me is a syn ack right and then ack that's just to protect against attacks and spoofing attacks and stuff like that plus uh the routers will want it doesn't doesn't want you just to keep a connection alive for forever it will say okay you establish a connection with this server and you left it for i don't know an hour i'm gonna drop any knowledge about this connection so the next data you sent the router verifies says wait a second you're trying to send data to this source ip address what is this let me look oh that's a tcp segment oh sorry you can't send a tcp segment you never established a connection with this guy you better establish a connection with this guy first but i did no you didn't i have no knowledge the router dropped the knowledge about the connection but yeah all this rambling guys just just make me appreciate this this protocol right the ip protocol and the things on top of it the tcp and all the features built in the tcp the problem with these features built into tcp is it's just there's too many features that not everybody needs because sometimes we want to jump ahead to the ip protocol and i just want i just want ports give me the the features of ports right giving the features of delivering packets to an application not just a host a host and a particular application that thing i don't care about re-transmission because you're slow and you don't know the context about my application because you you just reach packets as packets no my packets are not just packets i have more knowledge right that's how http 2 came into the vector suffer from tcp hit blocking and then a quick came into the picture and pushed that knowledge on top of udp and introduce the idea of streams which i talked about in many many other videos right so that just makes you appreciate so let's take a moment to appreciate this beautiful protocol the ip protocol and the tcp protocol that literally everybody's using it's used everywhere and one last thought guys while we do appreciate this protocol and it's always good to challenge it and ask questions says okay why are things the way they are because most probably not a lot of people know the answer because it's just forgotten people just dead right people who built this was where in 1981 probably they were late age and most of them are probably dead right and and this knowledge are just lost with them yeah some nodes are transferring books but you can't translate everything right some of the knowledge will be lost some people would just lose the fact okay why are we doing this this way just yeah because we just always done it this way okay you can challenge literally everything but unfortunately we are bound at least at least to the iep protocol because if you want to send something across the internet you have to use at least ip you don't have to use udp you don't have to use tcp but at least you have to use ip right i don't know middle routers might actually check your okay you have to use at least udb or tcv but i don't know what if you created your own put on top of ip directly hey i my host is just a single machine with a single app you create a special a special protocol that is just a delivering if i deliver something to this host and it's not just any operators it's just a whole custom thing right and i just want it to be delivered and it's okay if it's out of order because i want just the jest of it i want certain information about this just deliver it make it as as tiny as possible and just deliver that udp which is this connectionless thing is usually does most of the job it does the job really well but what if you can build something even better and can you even build it i don't know just it's good to think about all these things all right guys that was a quick video to to appreciate the tcp protocol i'm gonna see in the next one you guys stay awesome goodbye
Info
Channel: Hussein Nasser
Views: 4,849
Rating: undefined out of 5
Keywords: hussein nasser, backend engineering, internet protocol, ip address, tcp/ip, tcp, arpa, arpanet, dod
Id: 1xJXTJzp7Mw
Channel Id: undefined
Length: 28min 28sec (1708 seconds)
Published: Fri Sep 03 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.