What happens before the Backend gets the Request

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

I'd like to take you for a journey of the request and which is this unit of work that the front end sends to the back end exactly what happened from the moment that this request is sent to until it reaches the back end user space process for processing it's often the case that we focus on I suppose the first step sending the request and the last step actually processing that request when you get that on request beautiful event in node.js or python right hey someone just sent the request that's all what you see in the backing process but there is so much to it and that there are so many uh things that happen in the in between and while we'd like sometimes to it's nice to close our eyes and just wish that this magic happens in the most performant way it doesn't always happen right just because these things are hidden from us using you know TLS libraries open SSL and HTTP libraries if you're using HTTP as a protocol and then the kernel and it's all it's mighty you know you know capabilities it's it's to be honest it's hiding this things from us right but just because these things are hidden they do not mean that they don't exist right there's so much work that is involved in this really understanding in it is what I I tend to push in this channel right it's like you don't have to code everything like like build everything yourself that's not what I'm saying it's just you can use these libraries and but I think back in engineer specifically full stack you know mandatory uh we have to understand this thing right we need to understand exactly what happens at every stage of of the lifetime of a request because we we tend to take it for granted and then and when things go wrong we are we feel paralyzed right because because we don't we don't we don't understand what's happening right when we couldn't scale uh Beyond a thousand connection per machine we tend to just throw more Hardware at it right or or just spin up more machines or or go distributed right prematurely might I say right adding more Hardware without actually understanding the cause of the bottlenecks here and I talk about that in my my new course my performance I'm unlocking the backend performance bottlenecks so many things happen until the request is is ready to be processed and I'm here to kind of go through that Journey kind of illuminate these steps but before we get we are getting started we need to Define what a requests really mean that's right so how about we get started welcome to the back engineering show with your host Jose Nasser and a request is um is a unit of work that is submitted by afront end of some sort in in a form of a specific well-defined protocol and this is often this protocol often sits on top of either TCP or UDP which is the you know the de facto protocol of Transport in the internet right you might say why do I why do I need why do I need that why do I need another protocol to Define requests can't I say hey my requests are always 10 byte long and if I send them they always start with this particular binary binary cone and this is their end right and they are always 10 10 bytes in length well you just defined a protocol essentially right like you defined your own application protocol that segments the the the stream of byte because you see always think of the TCP layer at least no as a as a hose it's a water hose you know it just looks like someone's pointing a constant stream of water to a machine and just it's just a stream of bytes right and that machine just reads Retreats this but it doesn't have any meaning per se but then you define meaning to it in the in the application by actually reading these streams you're reading reading reading says oh here's a start of request and you define what that what does that mean right here's oh I see a head of a request like that's the head it's like okay I'm reading that oh never mind that's just a false alarm just toss it it's like oh actually this is a request it's a get space slash hdv11 oh looks like I hdb11 request all right let me continue reading oh I see content length header okay continuous is like here it's a thousand byte length so I need to read right after I finish the headers I need to read a thousand bytes after the headers that's my body of the request it has to be a post in this case because gets don't have body but then that is essentially my request and that is as a unit you probably deserialize that right into the language of choice that you choose C plus plus C by python Ruby node.js whatever and that gets you a nice object in the back end that says hey wreck req and that's your request but there's so much work that happened there and and and and essentially it's just defining the first that request the start of the request and the end of the record but guess what the the the back end is to read the stream of byte and always do this parsing thing that we talked about to find out the start and the end of a request and chunk those buys into meaningful unit of work and HTTP 11 as a protocol is different than hdb2 and and it's different than hdb3 as a protocol when it comes to the on wire uh representation of a request completely different hdb one was the simplest most elegant representation of that if you receive something from http11 protocol that is the request there is no you know other stuff to it you know it's it's straight to the point well this to be two no there are frames there are there are there's an additional structure that is wrapped around requests there are streams and streams have frames and there is a data frame and there is a header frame and there is other type of frames there is more work involved in parsing hdb 2 than HTTP one and that's why hdb2 is more CPU intensive because your back end process believe it or not is doing stuff that it didn't used to do in http 11. and then there is also your back end is referencing an open SSL or a Libra SSL library to do the decryption and encryption for TLS sessions right that's also an untrivial cost right before you even start parsing all that stuff you need to to decrypt these bytes before it even begin to parse right that's that's the problem that we're faced with that there's so much happening so that's a request now let's go through these steps and now that we Define what a request is a request like to summarize and of course has a start and a beginning and it has a beginning and an end and at this defined by your protocol so even SSH has a request if you do an uh anything like LS and you hit enter right and in bash environment as such you are sending a request and you're waiting for a response from the server right and that that's also has its unique definition so let's go through these stages these steps go through six stages a request goes through six Stitch before we even send our request we need a vehicle to send the request on and that's called the connection uh it's a it's customary to establish a connection between a client and a server and it's often a TCP connection so you do a send snack and then ack right and then you have a connection that means you you as a front-end and the server the back end have agreed on certain situations you agree on the window sizes you agree on the sequences and now you can start labeling your bytes not request TCP does not know anything about requests just bites to it right now you use these bytes this stream of byte as a vehicle to stop sending your request in whatever language you want in however protocol you want but before we actually build that connection what is really happening here so let's talk let's let's explore the connection space and how the kernel is involved and how the backend is involved as well so when you first listen on this part 8080 TCP and you listen on a specific interface all right specific IP address when you do that listening the kernel will create two cues for you just for the listener right so the 8080 listener it will create a socket object and that socket object is a file descriptor everything in linux's file is a file that is associated with port 8080 and that that is the destination IP is this and you can also listen on all interfaces by doing 0.0.0.0 or that's ipv4 all inter all ipv4 interface if you listen to want to listen to All IPv6 interfaces you do call in Colon double columns and then that will listen on all IPv6 not always a good idea because now you're you're attack surface just unnecessarily expanded right just listen on the things you actually need to listen on that's always a good idea now that file descriptor points to this port and then it has this IP address right let's give it an IP address one two three four the kernel also creates two pieces of data structure that is associated with this socket essentially those there are Pointers no and this is essentially called the syncue and the accept queue the accept queue holds the full-fledged connections right while the syncue holds the sentence that are arrived to this particular socket and they are held often temporarily until we and this is all in the kernel the backend is not even involved with any of this stuff right all you do is just basically say listen and the kernel creates a data structure for you two data structures two cues in this case so now let's say a client want to connect to this one two three four port 8080 it will send us in the the that packet will go all the way to the neck right the network interface controller I'm or card and that sen will eventually it's just still it's a packet right it's a CTV background it will move from that neck to the kernel memory they are a process often called dma which is the direct memory access where the the data is moved directly from that to the kernel CPU is not often involved in that area the kernel directly reads data from the network memory a little card memory or you can actually flip it that the network card actually writes to the kernel directly so now the kernel finds out that oh we got a sin right what is it addressed to is it addressed to one two three four port 88 guess what it will do a lookup so actually I have a queue for that that particular uh socket if it doesn't have it will drop it it will drop the packet and well-behaved kernels will actually reply back with an icmp message saying destination unreachable port and reachable or whatever but if the packet does exist then it will it will put that sin into that sin Q for that back end again this is all on the Kernel and the kernel immediately applies back to the client with a synec to complete the connection and now it continues to wait says sorry I sent that as an AG now once the client acts the snack right and we give back an ack that is when I have a full-fledged connection that's the three-way handshake right in the TCP now I can move that full-fledged connection to the accept queue which is another cue also in the kernel and we put it there and now the connection lives there is it still ready to be consumed not yet it's now us the backend application responsible for accepting the connection we have to accept the connection and that's a literally a system call system call s except you call accept on the socket file descriptor and says hey accept today whatever there is okay if there is a connection I won't accept it when you call accept if there is a connection in the accept queue the kernel will pop it will lock a structure pop it and then return a file description a new file descriptor representing that connection to the backend application now the backend has a pointer if you wish to the connection now the back end can talk directly to the client via this file descriptor we almost like established this this well connection right there's all this is now a line connecting the client all the way to the back nice we have a connection how large is this accept queue I can go into details right you can actually specify how large this accept queue can get because guess what if the back end doesn't accept connections then the connections will remain in the accept queue but for how long right well until the accept queue is full well how what does what is the size of the accept queue well you determine it as the backing application when you listen you actually specify a parameter called backlog and the backlog can specify how many connections can live there unacceptable before new uh sin will start to be timed out right because hey my accept queue is full I can't accept new connections I cannot complete not accept I cannot as a kernel I cannot complete new connections nice all right First Step accept so there is just that step right there there is so much work right that you as the back end engineer need to understand because your challenge here is to support a lot of connections first you have to learn how to accept connections as fast as possible because if the backlog except Q size is 128. then if you if you received a thousand connections that backlog is going to be filled and if you have a single thread accepting connection dedicated for accept connection it cannot possibly you will reach a point where that accept queue will be so small and you will increase it and then but then the problem is you still have a lot of backlog connections to be accepted and you're now you're forced to spin up multiple threads to accept I talked about all of that in my new course as well okay there's there is tricks and challenges and and and uh you can you can literally have your back end spin up two threads or two processes even both listen on the same port you might say no that's not right you're gonna get this error and there's a news nope not a few uh flagged listener as a with a flat codes socket option reuse board right and if there is like a specific handshake between you two as processes you can theoretically have 16 threads listening on the same port so if you do that then you get 16 except queues and 16 cine queues do you get 16 next since cues I don't know you're probably gonna get Once in Q and 16 accept cues right and now once once you receive connections you receive connections the kernel load balance those connections to the 16 queues and the threads can each read from their own except from their own except queue essentially right as I suppose as multiple threads fighting to read from the single except queue because you're gonna get limited by the accepting mutex right it's basic you know sum of four ideas you have to lock stuff otherwise you get race condition that's bad all right so we talked so much about the accept but eventually we have there's so much you can do here with the exception to accept a connection step number two our work does not even begin we just have a connection now right the step number two is actually you gotta read buddy as a backend application you now you have a connection the client's gonna send you stuff who's gonna read well it's you the backend application is gonna read or receive RCA RCV I suppose but what happened here let's what really really happened and what are we reading you're reading and let's say let's say it's Port 844 443 so it's encrypted as let's spice things up a little bit let's add encryption to the next frame so the client will send requests right they are one after the other right whether they are in streams or right or other format they'll be sent and those will be encrypted with DLS so the at the end of the day once you accept a connection I forgot to mention that you get additional two queues per connection one is called the X the receive queue and one one is the send queue okay for each connection that you create you're gonna get a receiver queue where the bytes will arrive from the client and you're gonna get a Cindy queue where your voice is a backend application will be moved to the send queue before it's transmitted across the network right and guess what let's talk about just the receiver queue so if a client sends an HTTP request right and it's encrypted those are bytes right the current the neck will receive those bytes as we talked about and the the kernel will identify that oh these bytes are going to this uh to this IP to this port and and to this part from this particular client from this particular Port so and I and I guess what these four pairs essentially mapped to a unique connection and here it is it is this receiver queue it's all pointers at the end of the day right it's like oh so let me just move this SKB I think that's called socket buffers right so that I move the socket buffer to this receiver queue to this connection buffer Q now you have a connection that you as the backend application got the file descriptor for and it has a receive queue with data but you don't see that yet that receiver queue keeps receiving data and I'm just and accumulating the receive queue and those packets are acknowledged by the kernel all the TCP stuff is done by the kernel that was work done long time ago to to actually move the TCP layer down to the kernel as opposed the kernel space Sorry the user space quick to today's is being done at the user space right now like as far as the kernel is concerned it's all UDP it doesn't know it's a quick or not maybe it does but it doesn't do anything about it in the future maybe but today we're acknowledging all these you know window sizing and controlling the window sizes and shrinking and increasing and then the congestion control algorithms and and the slow start all of this was implemented in the kernel and with the users you can specify change parameters of course but yeah that's that's that's what you have you have now the receive queue with a bunch of bytes right now as a backend application you call read again you might say Hussein I never in my life did accept all called read or anything too well the framework or the library the language often does that for you so yeah your backend is reading points but guess what those bytes are raw awning are raw encrypted bytes and that you read that back in reads and you you have a buffer and you read them and the back end associate specific memory either in the stack or the Heap depends how you do it like if you declare a variable in your function that is a stack right but if you did a man lock then that's a heap so this data is copied from the kernel memory receive queue down to the user space memory and that's why IOU rank the new interface and Linux tries to eliminate this unnecessary copying back and forth right between kernel space and memory user space but yeah you get you get you now you copied encrypted data guess what what do you do with this they are absolutely useless to you they don't represent a request up until the step we don't know what a request is yet so we're step number one we accept the connections remember to we're reading bytes we have absolutely no idea what a request is we're just blind as a backend application so now you're reading bytes and those bytes will be copied as we said this encrypted byte will be copied to the memory of the user space to your backend application essentially and they live there another challenge is is reading as well like how fast you read because that receive buffer also have a size a limited size so you as a backend application must read fast as fast as the client can send and it's the other way around if you send data the client should be able to handle what you sent I'm it's a it's a delicate you know tug of war kind of a situation here right so that's another thing you can investigate and invest in and understand the architecture of multiple readers all right so you can have multiple acceptors one acceptor up to you like node.js is a one acceptor one reader right model because it's a single thread and everything happens in that thread literally and unless you you do asynchronous work then multiple threads will be used node.js but yes that's why we're like every language you pick every framework you pick it's it's gonna do the thing differently but the fundamentals do not change they you have to do these things right if there's a fire it must be smoke and now that's the second step read if things were not encrypted right we're probably gonna be almost done here I'm gonna read and then we're gonna start with the process of parsing right but we can't parse anything yet in the back the protocol cannot kick in yet our protocol the HTTP or the SSH or whatever protocol you built or websockets Nothing Game can kick in yet and step two we're just reading corrupting but if it's unencrypted yeah we can start the process of parsing the protocol and finding our request but not yet right so now move us to step number three which is decrypting now now that I have in my memory a bunch of a bag of encrypted buys that I know they are encrypted because I did a session I did a ctls session now I find my key the you know the symmetric key that I use to decrypt and I exchange with the client that's just the whole thing you know did I put in a memory probably sit there in your user space memory let's just pull it up and decrypt it you know with it uh if you're sophisticated enough uh you probably like the private key and the certificates and all the stuff is going to be in a TPM somewhere in your heart you know on your motherboard you're gonna ask your TPM to do the decryption right or at least this session establishment for you because you don't use the private key to the craft you know traffic you know you just use it to sign stuff more likely right the session key is probably living somewhere in your user space memory which is I suppose you can say it's dangerous but that's what we do today right okay if another hard plead happened and it's it's unlikely because these session keys are ephemeral once you're done with the connection you drop the session key so it's not like it's going to live forever like the private key so you're not gonna get a heart bleed situation here but yeah you can there's another thing the security Forks are taken care of thinking about all these problems so step number three to the klept you take the session key and then you decrypt these parts and guess what you have to copy them in another location right now you equipped by double the memory essentially can you decrypt in place I don't know maybe but now you're occupying memory more memory you need more memory to aquify from decorate from this and this and you often you don't do this yourself as well you use a library and and even you might even don't know any of that stuff because you use a HTTP Library or even node right or python that uses https protocol that uses either whatever is installed in your drive is it openssl or Libra SSL or whatever and that library is probably linked with your right in a specific virtual page in your process and then you use that to do the decryption so the code is is already on your machine I just use that to the clip so so that code gets kicked in but it's it's the cost that it's your process that is done does the execution because you it's it's that Library open since it is map this is linked to your process right so it's as if you are doing that back and you might come back and say hey why is my back-end process taking like 98 CPU I don't do anything I'm not doing anything well you are doing a lot of things it's just you don't know about it right so all this crypto magic you know AES 128 256 signature signing you know digest all of this happens and step number three the collecting now that we we can toss the encrypted stuff located now we have a nice memory of unencrypted stuff step number three we decrypted we took another hit right now this is the first time we actually take a CPU hit per se step number two wasn't really we didn't really need anything CPU we just literally did a copy right it's an IO and we don't do any processing for saying step two maybe in step one even step one is just an exception like except we're accepting stuff so that's the first step we're actually using is CPU so your process becomes slightly CPU bound if it's uh TLS or if there's tssl is involved but yeah watch out for that step number four now that we have a bunch of unencrypted bytes We Begin parsing we know our backend is an HTTP protocol or let's say it's http11 or hdb2 or CB message or SSH or postgres or MySQL protocol now we begin actually parsing by the way this this this steps is identical for any any back-end talk about database web server uh a live streaming web server a server you know any back-end must do these steps right so parsing is another step here which is based on my protocol I'm gonna parse now this is all right here's where my request starts oh there you go there's a request I have I have this much amount of bite if we're lucky and whatever I have in my memory so far I begin parsing and my mind I might find a request in this hunk of bite right or might be even lucky I'll find two requests oh there's one because then there's another Quest or even luckier I have three requests Isn't that cool three request in one shot well which one gets processed first that's a decision to the back end let's not get there again that's that's why I am fascinated by back energy because there is no one way to do any of this stuff you can do it anywhere anywhere you want there is always pros and cons to anything but yeah or the bad side is you might be an unlucky bastard and the memory that you just read doesn't have a half a request it happens right because that of course is still large so you happen to accept read but I only read half of words of data you decrypted that half of fourth but that request is in another half or maybe Estelle is in the receive queue that didn't read or maybe that client sent it and it didn't arrive so now you're in this parsing hell so oh man this is now uncomplete request what do you do with that well you put it on a side because you have other stuff to take care of well what what do you do really do you wait for more dates to arrive and hopefully the the request will be completed you have to because it's an uncomplete request you cannot fulfill an uncompleted request well unless you're uploading some file and and you got enough information from the first request that you know that the remaining is actually the data so you can do tricks with that but most of the case if you do that then you the backing will be waiting for more data to be read and decrypted before it can actually um parse this request right and you know what I mean by Porsche right just find out the start and the end all right and and sometimes not any data that available there is actually a request it could be just system frames from an http 2. protocol like settings or window update right that that's where it's like okay I can't do anything that's not a request that's not an actionable back in request and here is where the second hit to your CPUs is performed right on parsing because now it's all CPU right you're just taking the memory and then you're reading reading reading looking for campaigning certain things binding specific headers finding specific bodies finding specific patterns for your request so it's all CPU and and hdb11 is is gonna give you the least amount of headache here because it's it's a very vanilla straight protocol it doesn't have any you know magic to it but hdb2 well there's a lot of other headers and binary frames websockets same thing has its own headers that you need to unpack and understand so you're you're using this Library that you hope someone wrote in a very efficient manner that parses this you know stuff right so you take another hit of the CPU and and up until today 2023 it's it's known that hdb 2 consumes more CPU than http11 and just because the it's logic HTTP 2 does more right well it does give you benefits that you can Multiplex request on the same connection but it does that as a cost of extra stuff you know now if you take advantage of that then the the cost is apportioned but if you if you if your front end sends one request and waits for another there is no point of using HTTP 2. I just added an additional cost for nothing right so I always like you know measured that the need for HTTP one versus two based on that right because I I like to treat my CPU with respect you know just because we have something in abundance doesn't mean we have to waste okay like it's like going to a restaurant ordering everything and only eating one meal and throwing the rest well we have enough food let's just order right respect your CPU decoding the fifth step uh this spine this step might actually happen before parsing that depends on their protocol but decoding essentially is like referring to what am I reading right the actual request itself it's often happening after so that you you need whatever you got as a request is this text is this binary what is it really right and if it Stacks is it as key is it utf-8 because the byte might look the same but if it's ASCII it might look completely different than if it's you know utf-8 right you might send an emoji in ASCII doesn't mean anything but right it will show you maybe two letters right but in in right in in utf-8 it means a dog right that that's why a decoding is very critical right that's just one space or decoding and utff can take up to four bytes for certain characters like I think kanji can can take up to fire for white like for specific countries in Japanese uh so yeah drawing the stick you decoding that's based on the language of the bank and the language here I mean the human language not the programming language per se and also while not common there's sometimes you compress a request right and you need to decode this compressed request such that uh uh if you compress the request with Giza for example the backend can decompress it and that is another head for CPU so now we have three steps so we're hitting the CPU right so we have the TLs we have parsing and decoding as we'll take CPU so we're talking CBO and all these things I might say these are just tiny stuff who cares why even measure it well knowing about it actually doesn't hurt because if they happen on abundance and you did move your application to an iot device and all of a sudden you say oh why is this application taking 100 CPU and everything is slow well because you did all of the stuff that you didn't feel in your 128 core whatever CPU right take it for granted final step now that we decoded now we know the start and the end of the request now we fire the event if there is such a thing in in the back-end process I said hey a callback happened I just got a request yay I just got a quest do whatever you want with it but did you see all the stuff that happened right now this is what we just see as back in the days it's okay I just received a request now let me process this request right and even this step here as uh sometimes like there is before you process the request I'm I'm glad also like node and express Express uh JS does not do that for you I'm really glad that it's actually explicitly said hey this request contains a Json you parse that so that's a different parsing from the parsing from step four right we get that request but the request has a body and the body is of type Json and it's only bytes right if you want those Json bytes from normal bytes to actually something that you can work with in the application as objects then you deserialize that's a step that I don't mention right so it can happen before processing if you will right you can shove it in with decoding if you will right like I would deserialize this step uh this this Json into Json object such that I can call a DOT whatever in it right I can play with it in JavaScript or in Python you move this binaries into dictionaries right in Python is that what's called dictionaries right python that's what Json is right in Python so you move these bytes into the applicable uh programming language data structure of your choice if you're using node then it's JavaScript it's Json it's native Json of the JavaScript then there will be serial deserialization Happening Here boom move it from a serialized byte down to an object that I can use if it's Java it's going to be a bunch of objects right and boy this is really costly especially it depends on the language there's a lot of overhead when it comes to Jason parsing and deserializing so that's another step and then finally once you get the Json object now you actually can process because now you know what you sent what you are sent and you can process it and what does it mean to process well well you received a request too I don't know get slash books go give me the books right all right let me do it now you turn around and do a select star from books right limit 10. give me the first 10 books and that's another now your backend becomes a client and the database becomes the backend and you go through the exact same steps nothing changes the fundamentals are the fundamentals or the fundamentals and then once you process you now go to the reverse of these steps where you still do the these steps but they are in Reverse right where now you're writing instead of reading you have the connection you're writing to the send queue before you write you have to encrypt right and before you encrypt you have to encode right so encode decrypt the serialize and then encrypt right to the send queue now the kernel will take take care of that Wilson you're encrypted by to the client and the client will do that the reverse and so on right so the processing aspects of things is actually also interesting because all these six steps actually any one of them you can really write I'm not exaggerating you can write a white paper and you can take your Masters or or PhD in each one of these steps right because like take the processing first like how do you process request you you have a fleet of requests arriving to your request and I don't care about distribution don't don't don't don't don't involve the stuff now here robot Simplicity what about learning what happens in a single machine let's not care about what multi-process and multi you know distributed and stuff like that or one machine only one machine it's like how do you my to me you should be as efficient as possible in one machine otherwise if you distribute too soon to me it's a cop-out you know it's just like you well you just said yeah CPU 100 add more machines why is it 100 which of these steps are actually consuming more of your memory well I don't care well sure okay just throw more Ramen make the venture capitalist happy and then move on with your life you can do that sure but you can also be an engineer and understand these things and that's what fascinates me you know I just sit down wake up every day and pick one thing that I think I know but I turned out that I don't right and and they're just processing the thick think of just processing request right like you can have our your own architecture to process requests right even if it ends up Distributing to another machine but it's just understanding that we have now a bunch of requests I can spin up a pool of worker processes five six what's this number what does this number depend to well it depends on the workload of the process what is the nature of your request this is definitely out of the scope of this video is it is your is your workload is your request CPU bound or is it I O bound okay is your request is your request traversing uh one billion uh graph looking for something which is CPU intensive thing or is it uh reading from a database or reading from another service or is it both where you are reading the graph well that's IO and then you are traversing the graph right so IO and CPU intensive and and that's how you scale that's how people pay or paid millions of dollars just to understand how to scale things you know how do I size my machines so yeah that's what I'm interested in to be honest is just understanding the bottleneck right understanding the nature of your requests what happens through all of that I hope you enjoyed this episode and I'm gonna see you in the next one you guys stay awesome goodbye

Info

Channel: Hussein Nasser

Views: 47,417

Rating: undefined out of 5

Keywords: hussein nasser, backend engineering, linux kernel, backend kernel, backend request

Id: gSQoA4SYhJY

Channel Id: undefined

Length: 51min 26sec (3086 seconds)

Published: Tue Aug 01 2023