GoLab 2018 - Filippo Valsorda - Building a DIY proxy with the net package

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
I run I'm Philippa I work on the go team at Google and I've done a various networking cryptography work in the past but today we're here to have a look at doing networking with the net package Ingo now before we start from as a show of hands who here has ever used the net package directly not through net HTTP not through G RPC but directly importing net and using that con that listen directly working with connections that's not a lot of hands and that's a good thing for two reasons one reason is that it means that we're providing you with high level interfaces that you need to get your work done without working directly with connections and the other reason this is good is that what we're going to do today is look at how to use the net package to implement useful program directly dealing with the connections now before we go into using the package I want to talk for a bit about networking Ingo because networking Ingo is a bit of its own special thing and to understand why I want to take a step back and look at how it used to work in other older systems so for for a long time an option has been to build threading servers the the scenario here is that you have a server you want to accept some connections and you want to serve them and you'd like to be able to serve more than one connection at the same time more than one client at the same time from the same server one option is that every time that the connection comes in you start a new tribe a thread is essentially a kernel level process that shares some memory with the original with the original program and the important thing for what we care about in there so there's a lot of small differences between processes and threads but what we care about is that the kernel does all the scheduling for threads so we just run our code we block we just run our code sequentially and the kernel will take our thread and put it on the various CPUs when it needs to run code and take it off when it it needs to run something else on the same CPU now the disadvantage of this is that threads are not that lightweight threads will come with a whole system stack which can be between it can be 2 4 megabytes depending on how your system is set up and it will take the kernel scheduling your your sister your thread and every time your thread comes in and out of the CPU it has to freeze all the state that you need to go back to executing that so over time a new pattern developed which is called the event loop popular high performance programs like engine X use this pattern how this works is that instead of having one thread per each connection for each client that you are trying to serve with your server you're going to have one few a pool of threads but anyway a small limited number of threads and you're going to have event loop what happens is that you're going to have one thread that whose job is just to ask the colonel let me know if any of these events happen and one of one when one of those events happen the event gets put on a queue and then you have this single thread which is going to go to the queue pick up something that needs to be done for example there's a new connection takes the new connection looks at who the client is prepares the answer and sends the answer but it doesn't add to its end answer it doesn't block it tells the kernel okay I want to send this and let me know when the son is done and it puts it back into monitoring for events and then it goes back to the event loop it it takes to event queue it takes one more thing from the event queue for example this client has has read your response and now it's sending another anyway it keeps going through this through this loop where it tries to do as much work as possible without blocking without stopping to read files without stopping to for connections without stopping for to read data from the network in only those things that you can do on the CPU and then it saves State goes to the Evangel now this is particularly performant because it never uses kernel resources just to stop and wait for things to happen but it's also fairly heavy on the programmer the nginx developers have to take care of how they save the state how they remember who this client was they have to remember that this request is from the same client of this request and they have to restore all that state it's not a pleasant way to write code and it's not simple and in terms of an NGO programmers prefer symbol so how does go work go brings us the best of both worlds because when you write go code you don't have to worry about the event loop you don't have to worry about the event queue you don't have to worry about a poll and asking the kernel to let you know about many things at the same time you just write things as they were threads you just start your routine accept the request block to send the answer and then the Guru team will resume running once the client is ready to take the answer or once it comes with the next request but what's happening under the hood under the hood there's a scheduler but the go scheduler which runs inside your program inside the run time and that scheduler runs all that complex a pole or kernel specific logic to have the kernel efficiently tell that which things are ready to run at each time and so anytime you block for example you do hey I want to read this file because this is the file I want to send to this client you you just do read and just wait for read to return but what's happening is that your goroutine is yielding to the scanner is telling the scheduler okay I'm gonna block here and we don't want to block the CPU so take me off the CPU but just let me know when the file is ready because we don't realize computers are extremely fast for a human but there are very wide differences in Layton sees things that run on the CPUs are fast things that take things from RAM are fairly fast things that we read files are slow who thinks that read across the network are impossible as well okay so this is what the go scheduler does for us it takes code that we write which is just blocking code and it turns it into code that behaves as if it was written with an event loop so it takes our code that is as easy to write as threading code and it turns it into code that is as fast as the event loop so now we're going to go and see how we actually use this package all of the things I told you about happen under the hood there is a net ball a thread but we don't have to worry about them as we will see we just go on our on our way servicing our connection all right so time to start coding the first thing we need to be able to do is accepting connections we are writing a small server so we want to listen on a port and an address and we want to accept new connections that come in we do that with the listen with the net listener interface zoom is not my friend so we use the one of the two core interfaces of the net package the net listener interface it simply has accept method and that accept method returns a connection when someone is connecting to us in an error and then there's you know close for when you want to stop that which is not something we want to do for now so every time we run accept we block until there's a new connection and when there's a new connection we get our value of type con which is another interface which has a read which is our reader a write a close eye or closer eye or writer because ingo interfaces are composable so you don't actually have to say hey this implements this and this and this you just have to implement them and they would be compatible and then some methods for addresses and then meters for deadlines which is how we implement timeouts very well so we start with our program we start with function main like a normal main package and we start by getting a new listener we say that we want to use the TCP protocol which is the same protocol that backs HTTP for example because we want reliable delivery of packets that takes care of retransmitting packets that get dropped making sure that everything comes in order the alternative will be UDP which does not provide those we check errors because of course and then we have our listener with our listener we need to build what we call the accept loop because we want to keep accepting new connections because our server will serve this many clients so for each loop we'll get a connection by calling that SAP metal if there is an error we are supposed to check if it's a temporary error and in that case retry but for now we're just going to quit the program if we can't accept connections and then we need to do something with this connection now as we know con is a reader our rider is something we can read from to get data from the client that just connected to us and we can write to to send data back so there is for example we may want to copy that data to stand our error for that there is a simple IO function which is our your copy or copy takes a destination writer and a source reader now a writer can be anything that implements right so for example it can be a file and standard error is a hoss dot file oslo file has a write method so we can pass it as a writer and as the reader we can just pass our connection and this will return the number of bytes and whatever error that happened during the connection and we're just gonna print completed connection with an equal person decimal an error a value okay so this is enough to write a small program that accepts connections and copies everything that comes into the connection to stander error so we just run it now to connect to it we can use a common line teal tea called netcat netcat is simply a UNIX tool that from the common line opens a connection to some other port and an address and allows us to send data from the terminal we send the data from here it gets accepted by gets into IO copy and goes into standard error now this looks like it's working but it has one minor problem what happens if we try making the second connection because you know we have many clients that connect what we're website and to happen to connect at the same time we try connecting from this one and nothing happens why is that because now we are blocking inside copy here and while we are blocked inside copy we are not accepting new connections and until we return from one of the copies by closing this connection then we start accepting the next one and do the loop again so what we need to do here is use goroutines so instead we move this code into a function that we call copy to stander error and which will take a net con which is an interface and it will do what it was doing and every time we accept a connection we're going to start a new go team simply with the go keyword and copy the standard error our connection now we run this and we can run it with two connections and they're both running at the same time what's happening is that the go skater underneath is keeping tabs on both this connection in this connection and as soon as one of the two sends something Flags the goroutine as runnable puts it on a CPU and lets it do its job which is in this case read from this right to that very well so now what I told you about what I showed you about blocking the accept loop seems like a fairly hard mistake to make but while usually you're not tempted to put all your work inside the accept loop blocking the access preventing except from running for the entire time something that happens often is that you run accept and then you just do a little work before going through the loop again for example you run accept and you read you read something for example you read you read the header because maybe this way you have the header and you can start working with all the information it needs to start and usually normal clients will start a connection and immediately send some data so you will not notice that that's broken until someone opens a connection and doesn't send anything at that point they're holding up your accept from running and your server is down because nothing else can connect this is a security vulnerability and it happens this was a DNS server open source DNS server that was doing exactly that it was accepting a connection and reading the request which is very small it's usually a single packet before starting the GU routine that would service it but if a client were to open a connection and not send anything the server would go down because it will not be servicing so important to keep in mind that the set loop needs to have nothing else than accept and non blocking code things that can run very fast without blocking on anything insider so to recap we've seen how to get a listener by calling net listen using the listener to accept connections and service connections using goroutines and we've seen how we never block in that set loop now next let's build something a bit more useful because this was ok something that can copy things from the connection to the terminal but let's build something that we could use in practice we're going to build a small proxy a proxy is nothing else than a small service that will take a connection open a connection to an upstream some other service and take data from the client to the upstream and back from the upstream to the client it simply sits in the middle and proxies thing from one to the other entity so for example here we're going to make a proxy that accepts connection from a browser and send opens a connection to google.com and sends that data over to google.com but you can proxy between anything so to do that we're going to write a new function and in this function there is one more problem in the other in the previous function that we haven't addressed it has no timeouts now I told you that goroutines are fast and that's true and sorry berkians are cheap and that's true but when a guru team is holding some kernel level resource for example it's holding file descriptor because it's holding a connection open it becomes precious because you don't have that many file descriptors in your system now who here has ever seen error can't exceed retrying in one second run out of file descriptors lots of nodding lots of pain chart pain let's take a second but what when that happens is because you have a lot of connections open and you don't have enough file descriptors in your limit to accept new connections usually when that happens either you have too much load or more often you don't have timeouts that just kill connections that have been open for too long so how do we apply timeouts timeouts are applied with deadlines deadlines ingo are a hard time of absolute time at which either the read or the write complete or it returns with a timeout error but it's not a idle time and it's not like you have to send something every 10 seconds it's a deadline so to use that we're going to have to write our own copy function so we're going to have to write a for loop where we have some matter it doesn't matter what size because it's just the chunks in which we're going to read and copy things over and we're going to first do a read on the connection so we're going to read into we're going to read into the buffer and we're going to if there's an error we're going to just print it and return and then we're going to do a right on our all stop a standard error of the buffer up to n because read might not read as many as much as many bytes as there is space it might be less and we want to copy only up to there okay and then when an error happens we're going to hit this now when the connection color is over the error return will be a AOF error and the file now when that happens we want to make sure that we close the connection because for example if the error is something else we want to be sure that we're closing it so now we tried this new one oh of course we wrote this to other deadline so now at every iteration just before we do the read we're going to set a deadline so for example we're going to say that we want the read to terminate within time table now plus say five seconds so we want each read to terminate within five seconds now when we run this we'll have only 5 seconds to stay idle 1 2 3 4 5 before the connection gets closed this way if a client just opens a connection and never does anything it's not holding up one of our precious file descriptors okay so we'll sing how to add timeouts but we will go back nope - not adding those to the proxy because that because that requires a bit more thinking what timeouts do you really want it depends on how much you trust the app stream because you might want to just defer all closing connections to the upstream if you trust the connection to it maybe if it's through local or maybe if it's a remote afternoon so you want to make sure you have timeouts on both sides of the proxy but that's for that's a matter of configuration now to write our proxy we want to copy things in both directions so we want to be able to we want to have our connection from the client and we want a connection to the upstream and to do that we're going to use the other main function of the net package which is the dial we use listen to accept connections and we use dial to connect to something else so we're going to connect to google.com at port 443 because that's the pourer of HTTP we make sure to the fur closing this connection as well so that when we're done with the proxy we close both because this is another way to end up without file descriptors if you only close the downstream connection leave the goroutine and now there's a dangling connection open on the other side and then we need to do copies in two directions before we're just copying things from the client to stand our error now we want to be copying things from the down downstream connection to the upstream and the other way around now something you should be worried about here is that I am starting a goroutine without tracking it that's usually at that time because usually when you start a girl routine and you do not know how it ends it ends with you getting paged so here it's fine because we know that as soon as one of these two returns an error it will as soon as this one returns an error it will heat the defer and it will close both connections closing both connections will is going to cause an error on the IO copy as well and it will bring down the ball routine because when the function of the guru team returns the go team gets gets go sixteen so now we're going to run proxy on our connections and we're going to go to a browser and we're going to load now it looks like an error but it's actually working how do we know it's working that error wasn't generated by us that error was generated by Google calm because we just asked google.com to serve local host and google.com when like I don't really host localhost but something that we can look at is the fact that the certificate that it tried to serve is a certificate for google.com why because we're proxying the connection we are taking all the bytes from the client sending them to the server and all the bytes from the server sending them to the client the server is sending a certificate from google.com which of course is not valid because we are trying to connect to localhost calm this is not some security attack this is how things are supposed to work but now of course even now we've seen how we've seen how to build a small proxy with with go and what if I told you that this proxy is literally as fast as it can get and I'm not joking I'm because what happens on the hood is that IO dot copy is noticing that on one side it has a TCP connection and on the other side it also has a TCP connection and so what it's doing is that it's going into the kernel and telling the kernel take these two file descriptors and just send from one to the other and the kernel is doing all the work from there on so it can go any faster because the program is not involved at all this is what nginx does it's famous for using send file to send from files to TCP connections and for using splice to send from TCP connection to say TCP connections go does that magically for you by the power of interface upgrades which are one of my favorite feature favorite patterns in go inside copy there is a bit of code that checks if either the destination or the source know how to take the other side and take care of the copying themselves so either they have a read from or a right to method and if they do it takes one and passes it to the other and tells them okay go away and do your thing and TCP connections know how to take another TCP connection and tell the colonel to just merge them to just copy from one to the other so with this simple 30 lines of code we are using the fastest possible way at the kernel level to copy things from one connection to the other and we didn't even notice isn't that nice okay now I think I have five minutes left but it wouldn't be a talk of mine if there wasn't a bit of TLS so we're going to very quickly look into how to parse TLS when things arrive and then proxy them now TLS is overly complex protocol 20 years of legacy that builds on top of each other and thinks that we would very much like to remove and count so it starts with the header which has a type a version that doesn't mean anything anymore and a length and then inside there the first thing is a handshake which has another type which is different from the first type another length because there's a lot of length and then a client hello inside the client hello there is another version which doesn't mean anything anymore either and then there's a random value some session ID that doesn't mean anything anymore it's bad but thankfully I'm not going to write a parser on stage instead what we're going to do is use a parser that I wrote to extract this value this value is the server name indication it's what the browser send to let the other side know which certificate to serve in this case it's gonna send locals because that's what we are trying to connect but anytime you connect to something it sends in the first packet in the clear here is the name of the website I'm trying to connect it so give me the certificate for that if you have it nobody has a certificate for localhost that works everywhere so how we're going to write that is that first of all we're going to need a buffer to write things engine first thing first we're going to receive a connection the FIR closing the connection and set a deadline because we don't want to stop forever reading the client hello that's exactly what we just said is what we don't want to be doing so we set a real deadline of five seconds and then we create a new buffer which is a bias to buffer and we do a read but this time we use copy n because we want to read just the header part of the the header part of the of the record so we're going to copy to the buffer from the connection and we're going to copy one byte of type two bytes of one byte of type 2 bytes of version and two bytes of length well then based on how these two bytes of lymph we're going to read it into the length of the entire first message because that's what you need to do when you have variable length messages on a stream protocol so for that we're going to use the binary encoding package and we're going to use the UN the big endian encoding and the UN 16 value so buffed up by its invalid type that's unfortunate not when you call a method on it it just didn't have the import but yes it has to be a pointer here for example and we only want we don't want the first three bytes so from 3 to 5 so this is the length of the next message and again we read the first message in the exact same way we got into the buffer we read it from the connection and we read length worth now now that we have the entire client hello in the in the buffer we're going to cheat and use existing parser by calling parse client hello and we're going to pass the buffer by it and if this worked we're going to print the SNI which is fine hello dot asana now if this all works we're going to be reading the first packet finding the seneye value is and printing it to the terminal oh don't you dare so we're gonna reload this it's not gonna connect it's still connecting because we did not change it from proxy we're gonna connect it's gonna say connection closed because we're doing nothing else after closing the connection and it's gonna tell us what it just read now last part so to recap we're reading the header parsing it and printing the sni now last thing what if we wanted to proxy the connection after we did this well the problem is that to do that we would have to copy things from the existing connection but we already read part of the connection so if we proxy everything from here we're only proxying from here on and we're not sending that import on first packet over to google.com and the connection will break so how do we fix that we fix that using multi reader and that's why it's so beautiful that interface composition ingo because we can take two readers for example our we can first make a low type which is a wrapper for a Netcom and which is which we're going to call a perfect scone and we're out of time so I'm going to copy/paste this because of course I had but I'm going to explain how it works at least so what we are doing here is that we are creating a small Netcom butter so as morphing that exposes all the needles of a Netcom but also holes are either and then as monito to tell it which read function to use because otherwise we wouldn't know which one of these two to use and then we're going to use that as our new connection the after passing I copied to magic after passing here to the the multi reader as the reader so first is going to read from our buffer the things that we've already read from the connection and then it's going to read everything else from the connection and then we can just proxy that con just like we with the exact same function we are approximate on one basically we're stitching together the buffer we already had and the rest of the connection to proxy all the rest and this is not happy because we have to proxy than you want so if we stop this we're going to see that it's both proxy and showing localhost so this is that with something very useful in practice this is what's called the sni based router because it can look at this an eye and based on what website you want to connect decide where to send your connection and with these 56 lines all right 65 lines we've written something that has the same core feature as things like haproxy and that's all thanks to the net package so thank you very much and at this point I think we have a couple minutes for questions [Applause] what if listen I gets encrypted so what he's referring to is a long-standing privacy issue on the internet which is the fact that this value is unencrypted that's why we could read it from the connection the problem is that to encrypt it we need to know a key for the server before we make the connection and that's been a key distribution problem for a long time it's getting worked on as with a way to distribute keys through DNS such that then we would have the keys to then encrypt the destiny but that's a new standard that's going through the IETF process the answer is if you own the infrastructure you would have to decrypt TLS at your haproxy at your proxy level completely decrypted and then proxy maybe we encrypting it on the other side thankfully these days TLS is extremely fast so it's not really a performance issue anymore the other day I made a mistake developing the crypto TLS library in ingo I added the single allocation to reading records and it got 30% slower so the crypting each record is as low as doing three locations in your code and I'm sure you're not going counting all the three locations you do in your proxies so stop worrying about TLS performance it's fast anyone else questions you see any the proxy function when you have the goroutine that does the copy to a buffer and the other one there is there could you've you've been using the net piper function or is because that is does not does the interface upgrade so what net pipe does is that it gives you two net cons which talk to each other but here what we have is that we have two two net cons where one thing goes from one to the other well what we're doing here is that we have two cons already because we have the downstream and upstream cons and we want to be the copy between them so that code would not help us here pipe will not help us here if we were to yeah network can help us here also net pipe is in memory which is kind of weird when you write your tests is using net pipe it does not really work like with TCP because we TCP there are buffers you can send a bunch of stuff anyway the client doesn't read it all it's gonna sit in a buffer up to a certain window while net pipe will take each write from one side and do a wait for a read on the other side I'm saying this because I just finish and having to fix that in the crypto TLS package in the standard library thank you [Applause]
Info
Channel: GoLab conference
Views: 2,372
Rating: undefined out of 5
Keywords: GoLab 2018, web, Filippo Valsorda, diy proxy, net package
Id: J4J-A9tcjcA
Channel Id: undefined
Length: 40min 9sec (2409 seconds)
Published: Wed Jan 30 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.