How DNS really works and how it scales infinitely?

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so for us to connect to any machine on the internet we need its IP address right so even when in our browser we type www.google.com what our browser needs is the IP address to which we would establish the TCP connection right now this entire process is the DNS resolution process so obviously on the back end side of Google we would have a bunch of front-end servers who are responsible of serving google.com these are the servers in which they code on angular or whatever framework they are using is is there right and these servers are the one which are capable of serving the homepage of Google obviously these backend servers are behind a load balancer right now assume that this is a very simple architecture obviously Google has it much more complex than this but assume that the IP address of this machine this load balancer is 17. 5321 253 which means this is the machine to which or this is the IP address to which your browser will establish the connection when someone types in google.com right now how does your browser know that when someone reaches out to google.com it needs to connect or it needs to connect to this particular IP address now somewhere this particular mapping needs to be stored that www.google.com means 1753 21253 this is typically the a record or the C name record in the DNS configuration at Google Now this configuration is done in something called as a DNS zone now this zone is specific to google.com in which the mappings are stored www.google.com in 1753 21.2 53 bar. google.com is some load balancer C name or some MX record txt records and whatnot these are all stored as part of your Zone configuration so if you're using AWS you would know in R 53 what you configure is a hosted Zone this is that posted Z this is a logical grouping of the domains and The Logical grouping of all the subdomains and the DNS record for a particular domain so everything about google.com is in there right so you might have multiple such zones for different domains that you own right so now given that is a logical entity this mapping needs to be served from some server now these servers are called authoritative name servers now these servers responsibility split simp that given a request for a zone that they own they need to respond the record against it so for example I have this pink zone is the google.com Zone whenever and they say this three name servers are responsible for serving it so when a request for google.com comes into one of these three name servers it because it owns the Zone it would go read the key value pair get the value send it back right that's what it would do so authoritative name servers are the the ones on which zones are hosted and given a request they would send a response for the corresponding value this way if someone reaches out to these name servers directly asking that hey what is like I want to connect to www.google.com where should I go because it owns the Zone Google www.google.com it can get this IP address and serve it but now somehow the request needs to reach to these now what are these authoritative name servers these name servers typically look like this ns1.dns.com or awsd ns.nl name servers for example you can buy a domain from GoDaddy but you can choose to use awss name server so that you can manage your DNS through Route 53 you can very well do that right now we know we reached to the point that if we somehow know the name servers name the author name servers name given a request it would give me the record that I was looking for but how does your browser know this your browser cannot possibly know for all possible domain that EX exist so there are two ways to do it obviously first one is let there be a single gigantic system which contains all possible domains all possible subdomains against which you are storing all the mapping obviously millions and billions of entries single system not scalable not fall tolerant because if this goes down the entire internet is down not even manageable the amount of reads that are coming in the amount of rights that are happening on this the sheer volume of data it needs to handle that's why s system for this would not work given just the sure amount of request it would need to handle and the amount of data it would need to handle so which is where the world has gone with the approach of decentralization where one machine does not know at all so what does it look like now that we know that we have to somehow reach to this authoritative name server this ns1 gs.com now what does the process look like you cannot have a single place which knows all the details so it needs to be a decent system so this is where a critical component comes in called DNS resolver now DNS resolver is the one that does the DNS resolution process whatever the process we'll go into the specifics of it but whatever the process is it is done by DNS resolver now where is this DNS resolver running DNS resolver can run at multiple places one popular places at your ISP level let's say you are connected you using as you're you're using ACT internet Act is your ISP when the request goes to the ACT backbone the server that is handling it would be doing the resolution for you right your browser typically does not do that your OS typically does not do that you can make your OS do it by the way you can write your own DNS resolver not a big thing right but to simplify it your ISP can do it for you so request anyway will go through the ISP your ISP is doing the resolution for you and responding you with the IP address so that you can establish the TCP connection with the back end of Google right otherwise if you're on your home internet even your router is capable of doing it so your request you let's say you are on a Wi-Fi network you are connected to a router that router can do the DS resolution for you so I took a screenshot from my machine if I if I'm on my windows and I fire ip config /all I get an entry called DNS servers which contains the IP address of the machine that is my DNS server which is my DNS resolver and it is 192 .1 16801 that is the IP address of my router so in my case my router is doing the DNS resolution for me so obviously the request from my machine is going to the router your router is doing the D resolution and caching obviously the DNS uh the domain name IPS and whatnot because it does not need to do it again and again so in my case router is doing it but you can change this to a popular DNS resolvers like Google DNS resolver called 8 .8. 8.8 or Cloud FL DNS resolver 1.11.1 and then these are just DNS resolvers the request goes to the DNS resolver DNS resolver if it has the IP address against the domain name it sends it back if not it does the entire resolution process caches the IP address and sends the response to the user this is what happens at a DNS resolver level so I would highly recommend run this command on your terminal and see what you get in the output that is your DNS resolver try changing it to one of these and see what happens right okay now how does the DNS resolution process look like so now that assume that there is no caching across any layer so what does a very crude process look like so say if you're looking for www.google.com which means on our browser we type www.google.com your browser needs to somehow reach or establish a TCP connection with that machine 1753 machine right now request from your browser went to the router assume that your router is doing the resolution part for you so what will router do router will talk to something called as a root name server what is this root name server in the world there are total exactly 13 root name servers they start with they have the name like a. root servers.net b. root servers.net till m. root servers.net each root server or root name server is owned by a company or a institution let's let say first root server a. root servers.net is owned by very sign b. root servers.net is owned by University of Southern California right now here 13 root name servers exist in the world does not mean there are 13 physical servers it means that there are 13 fixed IP addresses which are my root name servers IP address whoever wants to get its domain name result can talk to that and it would spend out the other details as part of pro protol but then if these are obviously these would have been just 13 physical machines just imagine the sheer load those 13 machines would be getting so obviously this needs to be even distributed now how does that work here the key thing is that although it is one fixed address for each root name server owned by a particular organization that does not mean there is exactly one physical server for that root name server there are many all of them are distributed across across the world and they all advertise the same IP address using any cast so for example a root server has some IP address like that's a fixed IP address fixed IP address for a. root name server and let's say there are 50 servers across the world they're all broadcasting the same IP ip1 ip1 ip1 IP whatever that IP address is they're all broadcasting the same IP address now they are broadcasting with any cast the beauty of this is is that whenever any machine let's say my machine is trying to resolve uh www.google.com it will go to one of these 13 root name servers IP these are all hardcoded this literally hardcoded 13 IP address it would reach out to one of them at random now whenever it reaches out to them what would happen is it would reach out to the nearest physical server broadcasting the same IP or advertising the same IP now what happens it goes to the nearest one and that would return the response that a root name server is supposed to return that's the beauty of this right so again I'm wrting these are 13 root name servers which means 13 fixed IP addresses it does not mean there are 13 physical servers so for each IP addresses for each IP address or for each root name server there are multiple hundreds of servers distributed across the world across the world they are distributed and now whenever your router is trying to connect or trying to resolve and connect to a root name server it would connect to one of those machines which machine the nearest one how using anycast now if anycast is a separate beast in itself I've already explained anycast in one of my videos I would highly recommend you to watch that the video on my channel is how Google uses anycast to make their load balances highly available it's a very old paper on how Google made their load balances highly available using anyast highly recommend you to watch it to know the intrinsic details of how things work behind the lines right okay now assume that your request went to the closest root name server here what would that server do that server responds with the IP address of another server which is supposed to handle the top level domain.com because we are looking for google.com it would return me the IP address of a TLD server a top level domain server which is responsible for handling.com similarly for every top level domain there are set of servers which handle them so a set of servers forom set of servers Forin set of servers foredu and so on and so forth now what happens is root name server is aware of these the the TLD servers the TLD name servers it is aware of that so that it can respond so that's the responsibility of root name servers now you would say but arpit every time anyone is accessing website just all the requests go to the root time servers not really here your browser is caching a lot of stuff your operating system is caching it right the root name servers have fixed IP addresses and even if let's say I get a TLD name server for or IP address forom TLD I my machines can cach it because how often would that change not frequently right so there is a lot of caching involved over here so that the request can directly go to aom server and then it goes to the author name server and whatnot right that's how it works now what does the entire process look like the process looks very simple let's say this is your router or your ISP who is doing the DNS resolution for you it first talks to root name server root name server splits out the IP address of the machine which owns TLD uh sorry com TLD so it goes to that machine it responds with the authoritative name server that is configured which is ns1.dns.com which intern goes to the authoritative which owns that zone which is your google.com Zone and because the next request goes to that it can see for www.google.com who owns uh what is our value against it 1753 21253 and that's where it responds then your browser then your uh DNS resolver gets it it splits out this to your browser your browser then establish the connection to this IP address which is your load balancer then load balancer forwards the request to one of the backend servers or the front-end servers and responds with the HTML and that's how your request your connection is established then the request goes HTTP request goes and the response is sent and what this is one of the most beautiful uh pieces of softwares written it made internet what it is possible today because it gave a human readable name to every single thing out there not requiring us to remember weird IP addresses of machines right and that's the beauty this is how the entire entire DNS resolution process works so just to summarize assume that nothing is cached request comes to a DNS resolver let's say a router goes to root name server root name server responds with the IP address of a server handling.com TLD request goes to that it responds with the IP address of authoritative name server that owns the Zone google.com because you're looking for www.google.com so against google.com what is there it goes to that name server because it contains a corresponding Zone it is it can respond what is against www.google.com which is this IP address so at every step you're going get me top like.com TLD you go tocom give me W give me google.com stuff you go to that name and say give me www.google.com so it's step by step step by step resolution that has happened over here right and this is how you eventually get the IP address that you can connect to establish the TCP connection and then you send the request get the response and this is how the entire DNS resolution process looks like so I'm kind of going down the rabbit hole of exploring the DNS in depth so my next set of upcoming videos will be about me trying to building my own DNS server so stay tuned on my channel there is lot of interesting stuff that is coming along so yeah this is all what I wanted to cover today I hope you found it interesting hope you found it amazing that's it for this one I'll see you in the next one thanks at [Music]
Info
Channel: Arpit Bhayani
Views: 21,703
Rating: undefined out of 5
Keywords: Arpit Bhayani, Computer Science, Software Engineering, System Design, Interview Preparation, Handling Scale, Asli Engineering, Architecture, Real-world System Design, DNS, How DNS Works, DNS Explained, DNS REsolution Process, 1.1.1.1, 8.8.8.8, DNS Protocol, Name servers, Root name servers, Authoritative name servers, Role of name servers, how DNS scales, Anycast protocol
Id: g_gKI2HCElk
Channel Id: undefined
Length: 16min 35sec (995 seconds)
Published: Fri Apr 05 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.