Keeping the Balance: loadbalancing demystified

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
now we have Morelli who is going to talk to us about load balancing okay can everyone hear me yes I can hear my own voice excellent good afternoon everyone my name is Murali Surya and this is a talk that a colleague and I Laura Nolan put together I literally just tested this thing this is a talk that a colleague and I put together primarily because there is very little reference material for how to do kind of integrated load balancing up and down the stack there are plenty of things that tell you this is how you do Network load balancing or this is how you do high availability how it how you do DNS load balancing or how you do high availability web servers there's very little talking about how to do top to bottom stuff that that is modern the last one I'm aware of is fear shlush loss Nagle's book from like the early 2000s I am a computer scientist and network engineer turn Network sree sree is google's take on on how to do operations with software I now work in storage but I spend about eight years doing networking things Laura went the other way so she started off as a software engineer then became a non networking necessary and then joined networking later so why are we talking about load balancing load balancing failures turn into dropped requests right the short version of this is that I and I suspect many other people in this room have spent a lot of time debugging things that turned out to be network networking load balancing failures the goal here is to start with a very simple topology and then layer existing sets of load balancing technologies on top of that to get to the point where we're describing what could be a fairly like large highly scalable architecture that this is not a this is how you should look dude load balancing talk this is a here are all the tools available to you and how they interact and what you should think about when you are choosing which those tools to use just because we needed to pick up at the top or the bottom that start to start with we're gonna start with kind of a network focused view and then move our way up so we'll start with a very simple web app right it said it's a static content service just serving photos of owls right the highest quality owls you've ever seen on the Internet those of you who aren't aware this is an actual reddit it has it's great during the Superbowl because there are lots of very confused people trying to work out why there are pictures of ours so but yeah we'll start with the web app and we'll start with the singles a single server a single machine we're going to hand wave a hand wave over how you scale your application right there are many there are many good references on that if you want a recommendation from me Martin Clinton's designing data intensive applications I forget the exact title but mark 11 data intensive applications is like fantastic book so your customer who is a vendor of our related products are setting up their first website superballs comm this is where we're starting and so the 99 10 99 times out of 100 what this will end up looking like is you will have a server in a rack somewhere you will assign a public IP address we're using ipv4 just because laziness an ipv4 address to that server to 0 3 0 1 1 320 your Colo provider or your internet service providers edge ruses are advertising a prefix containing that IP address to the internet via bgp and you have a DNS provider somewhere and who is responsible who's authoritative for superballs comm and you tell them to serve up an a record for that name which points to the address of your server you might there's basically very little redundancy in this picture right if you're lucky your Colo provider will have redundant ARP links to the Internet right but other than that we haven't really put any kind of high availability in here at all and it's very hard to scale this right you're only scaling dimensions are what is the size of the link between your server and your Kohler provider Mike maybe it's one gig then it moves to 10 gig then it moves to 40 gig and the other thing is how big is your server and that's those are the only two things that you can you can fiddle with in this scenario it doesn't have load distribution right like there's one web server we can't serve any more traffic than it can handle so you can keep throwing money at whoever your hardware vendor is but but that's it you don't have any high availability really right like if your machine goes down or if any of those components goes down you are you are basically your customer is down they cannot serve their customers there is some amount of load distribution though right like network engineers just kind of do redundancy by definition just architectural you'll end up with redundant paths across the internet or even redundant paths between two points in a single data center so the network piece of this and we're mostly going to hand wave over it we'll talk about it a little bit of detail can be transparently increased in size mostly transparently to the user and similarly this is true on the public internet as well right there are many redundant paths through the public Internet so the problem here is we have some high availability but it's mostly outside of our control right it's in the control of our ISP or in Arcola provider and you know you have things like what happens if you have a flaky link inside your data center right let's say the link between your server and your your Colo provider is flaky is that your problem is that your Colo providers problem what can you even do about it if you only have the one link so we'll add the first thing you're going to do probably is you're going to a single either add a second server right let's say for whatever whatever reason where we have chosen raspberry PI's as a hardware platform because we've all gone to LCA and we've got a free one and so our raspberry pi I don't know what a raspberry pi can serve how many how many owl pictures are conserved but let's say conserve five our pictures a second right and so if we only have one of them 5 out 5 concurrent cost a second is the most we can serve so maybe we want to do and request a second and so if we have greater than five we need to add another unit of increment rights and that's a step function up this is not a capacity planning talk either so that that's yet another talk so we'll just kind of ignore how you come up with these numbers so in our existing setup the way we would add another server is we would add another server it would get another IP address probably it's from within the same network block that our existing server is in that's not guaranteed and we would probably add that second IP address to the DNS records for our for our services name and that will depending on a multitude of things there are several people here who I suspect are interested in the DNS but like DNS client library behavior stub resolver behavior recursive resolver behavior like many different things can influence whether an end user sees two IP addresses or just one what order are they returned in but let's assume all of this works as specs and as designed and you have large enough amounts of load you will probably end up with about half of your load going between each of your servers and this is fine except when it server goes away so when a server goes away what happens about half of your clients will see turnouts right so they will get let's say they get both IP addresses back and you're relying on the client to pick between them and they pick randomly half of your clients will get timeouts someone will probably tweet at you unless you have monitoring and then you'll know before someone tweets at you so then the way you fix this is you change your DNS configuration right so what we've done here is we've taken dot 21 is the server that's died we've taken dot 21 out of the out of the DNS records on the left-hand side and eventually after a TTL after a time to live your users will be happy again we'll talk about TTL Xin a bit shortly some clients may be a smart about this right you depending on the clients software you're using excuse me the client software may say oh I've got two endpoints to connect to but one of them is dead so I'll just prefer the other one but if this is just a browser right you'd like that could be people using like Android Froyo or whatever who are trying to access your website like it's it's by no means guaranteed these clients will be what I would consider well-behaved so yeah TTL trade-offs there are two dimensions here right you can either have super super long TTL z-- or you can have very short TTL very few people do anything in the middle right and when I say super long I'm talking like a day or multiple days when I'm saying short order of ten seconds to five minutes right like I'd be interested to know what the actual distribution is for names on the internet but I suspect that people do one or the other if you have long TTLs then your users will not see any change you make for a significant period of time right like 86 400 or 600 whichever way around it is is the number that many people know if they've configured DNS zone files if you have that number set then any change you make will take at least one day probably longer for it to be visible to all of your users conversely with very short times to live this means there will be much higher load on your DNS infrastructure right because let's say you say this was this response this DNS response is only valid for five minutes then we we end up with significantly more load on the DNS with benefit that it means that changes we make to the DNS are more quickly visible to cut to your customers to the the endpoints that you're trying to direct to your service quick clients have to query the DNS more often which adds latency and you know various DNS implementations maybe have weird timeout bugs and whatever and that could add by significant amounts of user user visible latency the other big thing to bear in mind is the lower your TTL czar the higher the proportion of your era budget is consumed by DNS right so if you're deep like when you have shorted CTLs you need much more reliable dns because it's at any given time or any given request it's much more likely to be hitting dns in some form and the other thing to bear in mind is many client and many DNS client implementations stub resolver 'z and recursive will ignore TTL below a certain threshold right so I don't know what the spec says I'm going to hand wave ten minutes five minutes rather I've seen some people use 10 seconds but again you have to know what client what the set of DN clients you're using is and what they will or will not obey five minutes is like a reasonable rule of thumb for a lower bound on the internet so back to our story so a new story now we have these multiple DNS record multiple servers with different public IP addresses we have a DNS record the points of both of them we're now doing load distribution right we have multiple pieces of serving infrastructure we don't really have high availability right and the other thing that's right to bear in mind I'm saying load distribution here and the talk here and the talks title is about load balancing the reason I'm saying load distribution is because the DNS has no concept of how loaded or busy any one of these servers is right in our current setup it's just assuming that all queries cost about the same amount in terms of compute resources on your machines it assumes they all last about the same amount of time and so you'll you'll get kind of approximately equal load but if you have a mix of queries right like maybe some owls are very expensive to serve and some hours are very cheap to serve then there'll be all sorts of interfering effects here high availability is not a great story right so firstly you've got propagation delay which is which is a function of whatever TTL you've set in the DNS and the second thing is you don't really have health tracking right like most if you're running your own DNS infrastructure most of most DNS resolvers do not have or DNS name servers excuse me authoritative name servers do not have the concept of health checking right like if you're running in I don't know and say kubernetes or maybe you're running in a public cloud some cloud providers have a DNS product that can health check the health of an instance and remove it from from a DNS load balancing group but this varies very widely depending on what you're using in terms of flexibility and when I say flexibility this is for like humans operators or like automation to redirect traffic in response to either failures or for planned maintenance it's like yeah but this is going to turn like depending on what DNS provider you're using this could end up being editor zone file and then like SCP it or post it via HTTP and and then wait for it to propagate through whatever magic and it's and it's still gonna be quite slow and to be clear right this is not us ripping on the date on the DNS DNS was not designed for load balancing rate like Paul vixie like if any of you followed Paul vixie on public mailing lists he's like this is not what it's for unfortunately because it's in everything people end up using DNS for load balancing because what it's the knob they have available so the next thing here is DNS details are a problem right because even in the case where where's my failure slide even in this case where let's say we have five-minute DNS TTLs serving 50% of errors for five minutes is not a great story like I just it offends me on kind of a fundamental level so we need a better story for this and so the next thing people often do is they will add a network load balancer right and that's what this black box in the middle here is so probably it's going to be a cluster of network load balancers we drew one here because the diagram was going to get busy otherwise but you know lots of third-party vendors or you know these could be just commodity x86 servers running some open source software but what you do here is you put the public IP address that's associated with your service website you put that on your load balancer and it then is configured to say for super requests to this public IP address here are my two possible backends which are now addressed out of private IP address space you may or may not put them in private air space right or in ipv6 you use maybe non route small space wouldn't be private well maybe it would be private this that gets into philosophical conversations which best discussed over a beer but this now gives you the ability to take servers out of out of rotation either for maintenance or in response to failure without any client visible changes right because the configuration is on a device you control the DNS is back to having a single IP address in the a record and so you could actually have really long T TLS with this setup right because your your high availability is actually not coming at the DNS level it's coming up the network level things to bear in mind this network load balancer will not know anything you're out most of the time well not know anything about your application layer protocol right all it will do is look at the IP header of incoming packets it will look at and possibly the the TCP header right so it'll look at source IP destination IP source port destination port and then pick whichever back-end to use it also has a ton of state in it which we'll talk about later as a form of network engineer state and network devices the thing I think is a terrible idea but it does give us a lot more high availability and control over traffic routing right maintaining one of the two individual servers becomes a lot a lot easier okay so now we're going to go off into network LAN for a bit and we're so we're talking I talked about network multipathing early on right like the physical the public Internet has a lot of multipathing your datacenter provider probably have many redundant paths between machines and their data centers mechanically how does this work network devices typically use either what's called five tuple or three tuple hashing will explain five tuple the process for three tuple is the same so let's say for a given destination you have n different possible paths you can take your a network device and you want to send to picking up 1.2.3.4 whatever and there are ten paths you can take how do you choose well the thing the industry has settled on is you look at the packet and you look at the source IP address the desk the destination IP address the protocol so TCP or UDP or ICMP or whatever and if it's a protocol that has port numbers TCP or UDP you then also look at your source port in your destination port so that that entire bit pattern you then produce a hash of it had like network hash functions is a whole other talk which which ones you should use but based on the output of that hash function for a given five tuple you then pick you you index into your list of backends and pick which one to use why do we use the five tuple primarily because TCP reacts very very poor or you have it I should be slightly more careful TCP used to react very very poorly to packet reordering if you send two packets from the same TCP flow across different network paths it is possible even likely but they will end up arriving at the far end point possibly in a different order from the order in which they were transmitted and TCP if it season out of order packet will treat that as analogous to packet loss and packet loss is what TCP uses to say oh there's congestion let me slow down and so you want ideally all the packets from the same flow to traverse the same set of network links other things to bear and so this diagram below just kind of explains it shows some example inbound and outbound flows to two racks full of service I think to bear in mind as well just something to keep in the back of your head basically always unless you're in a very constrained data center and enterprise environment where your network operations engineering architecture people have set things up otherwise basically all the time otherwise all traffic is in the network is asymmetric which means that the traffic for the same flow in one direction will go over a different set of links when the response traffic goes back right just something to always always keep in your head in terms of what else do I want to mention here Oh hashing Google's Maglev paper is like several years old now but it explains some of the challenges and picking network hashing Facebook have a thing called Catan which is their software Network load balancer which they have open sourced which implements one of the hashing algorithms described in the Google paper ok so the next thing here is we the hash based algorithm is a very simple one right because it it's just like for this flow pick this back-end but you can do more interesting things in layer for load balancing right when I say layer 4 this is network layer 4 because TCP and UDP transport protocols considered layer 4 you can do things like leased connections right so your load balancer because it's in in this set up in the path of every request knows how many open connections there are or pending requests there are to each back-end and so you could say pick the one that has the fewest connections which some workloads is actually not an unreasonable thing to do right assuming you you never go out of capacity so this gives us a lot more power an outright because we can add more instances to the back-end pool and this is completely transparent to the users right there's no DNS change there's no like basically like completely arbitrary right the only thing that can strange does is space and power and how quickly are our physical provider can give us new instances if you're in the cloud this is just click a button in the ideal AWS console or the Google Cloud console or whatever and it just appears we can add health checks here as well right because we can say hey network load balancer yes I know you're only doing packet forwarding but to establish the health of a server rather than just pinging it or rather than connecting on a TCP port hit this particular URL because this is a web server and if you can serve like out from winnie-the-pooh JPEG then this is probably a happy thing right so this is much better it gives us load distribution across multiple pieces of infrastructure it gives us avoiding unhealthy instances and it also gives the operators the ability to take individual servers out for maintenance usually the things it doesn't give you it doesn't give you knowledge of load on servers like the differences between different requests right because multiple requests can go over the same TCP connection it's also not content aware right so it's not going to be able to route different requests to different backends because maybe you want your images so from over here but you want your JavaScript server from over here and it's hard to do sensible load shedding and denial of service abuse protection because like to do that well you need to typically be aware of application level heuristics and information fir for fingerprinting okay so hiding a lot of network detail so I'm gonna talk a fairly quickly because I'm time constrained all of this stuff where we've said there's a magic network between you and your your servers and the internet we'll talk a bit more about how that works so your stateless load Network load distribution this is just you have many network devices in a path right so in this case you let's say you have two edge ruses in your data center and then you have two top of racks which is your acts of service 1+1 redundancy is just a very common model in in network provisioning 3 plus 1 or whatever is becoming more common but historically network vendors have gone to scale up rather than scale out although this is changing but if we consider one server to and from Iraq you can do something like so you have to Reuters let's say the server is only connected to the one switch and then it has to reuters connected that switch you can do something like vrrp or similar to have like high availability at your network gateway level and the routing level you can say one of these to reuters these two circles is the primary gateway to get to the rest of the internet and receive incoming traffic or you can load balance across the two this is what's called IP multipath so with this setup if you have a broken router then stuff naturally converges via the other path right if you have a broken switch it depends on whether your server is connected to multiple switches or not if it's only connected to one switch the server goes away but maybe hopefully you have a redundant server in the other rack if you have racks full of machines and they're all connected to two switches how do you make use of all of your bandwidth right so this is now we're talking about a capacity question rather than high availability question and the answer to this is something called ecmp equal cost multi path this is the thing you'll hear Network people throw around you have many links they all have the same cost and you use your different links using the hash hashing algorithms we described before different flows go over different links and so suddenly your available bandwidth is the sum of the all of the available link bandwidth going up or down from a given network device typically this is all stateless right this is all done in silicon and there's no per connection or per flow State in any of these network devices yeah caveats all flows are asymmetric the other big thing here is elephant flows are a real problem elephant flows are where you have a single flow which is larger than the size of the biggest link you have in a bundle right so say your data sense provider is using all 10 gig links and you have a flow that is 20 gig there is no way to solve this problem right they will need to operate a 40 gig or 100 gig on the physical layer to to be able to serve that so yes so what about soft link failure with soft link failure this is generally much harder in the networking world than otherwise because what this turns into is intermittent packet drops and often hardware just lies right like it will not increment a counter it will not tell you anything is wrong and so the only really way to solve this is to is to have like some sort of external monitoring where you're like trying to spray traffic across as many links as possible and see what stuff has gone down NORAD again Facebook have a thing they've open sourced I think it's called net NORAD I can't remember the name of it which is is doing this kind of introspection ok I'm gonna skip over this because proxy this as well I'm going to skip over layer 3 DSR so I'll talk about these two slides so with network load balancing we talked about the proxy set up where all packets inbound and outbound go through the same device the problem with this is that you are then bandwidth limited by that device right so if you have a very asymmetric workload and in web requests this is typically true unless you're taking a lot of upload traffic typically your request size is significantly smaller than your response size and so it would be nice to be able to have make use of all of your inbound bandwidth on a single load balancer and have many backends be able to serve and this is the idea of direct server return and this is effectively where your incoming packets go through your load Network load balancer but your returned packets don't they go directly back out to the internet or to your to your office users or whatever we used to do this using Ethernet trickery I'm gonna skip over this we will make the slides available people can read after the fact with layer 3 DSR the way we solve this is this is a long set of slides which one do I want I want this one the short version is load balancer receives an IP packet it then looks at which back-end it wants to send traffic for this flow to and then it sticks another IP header on the outside of that packet and what this means is your network load balancer doesn't need to be in the same rack or in this or connected to the same Ethernet switch as any of your backends and it allows you to still preserve that original P address information which is very useful for like spam and abuse and reputation protection and various other things skip skip skip so yes back to our story so we've so far we've only talked about multiple data single data center if we wanted to do multiple data center there are many reasons you might want to do this right one is just disaster recovery right you don't want an earthquake or a tsunami or whatever to take out all of your serving capacity but there is also benefit for users right because let's say you're a you're a us-based company if you have your own data center on the East Coast your user experience for your West Coast user is going to be terrible and so if you can have multiple locations and if you can steer users to those locations then that will be better most of the time when people do this the first thing they do is they do something called anycast this is a whole other run that I am not giving a talk about this time any cost is hard to monitor but it is cheap and easy and to do upfront you basically announce the same address from multiple locations users go to the closest one yes it's not load balancing I gave a whole other talk about this you can find the slides from SEO recon last year I think the other option you can do is you can have your multiple data centers have different IP addresses right so you have they have completely independent net blocks and then you end up with back in this in the DNS configuration you have two IP addresses for a given name up sides down sides here because if the entire data centers goes away you still have the DNS TTL problem right and you have like five minutes or a day before your users are redirected so the way a lot of people solve this is with something called backup routes or cover routes where both locations announced both sets of IP prefixes but they each one is primary for one and so if the other datacenter goes away it suddenly starts serving that datacenters IP addresses but it's typically they only serve traffic for their own prefix DNS the Geo load balancing I'm going to skip over this because I think you know we'll talk about this I think I'm more finished there so the perils of DNS geo load balancing so we're talking about here the the idea here is based on the end-users request you work out where to send them right I think this user is approximately in the u.s. West Coast or on the US East Coast and I will send them to the relevant data center IP addressing particularly in ipv4 is very very fragmented secondly the IP address you see from DNS requests is typically not the end users IP address but the address of their recursive resolver and you may have a large user population behind a recursive resolver who have very different latency profiles to your service right let's say you're an isp in India maybe you put all of your users behind a single DNS server and then their user experience is going to be terrible if that's the only signal you have for load balancing there is an extension called a DNS client subnet which allows you to embed information about where your clients are to to the upstream authoritative nameservers which gives you a hint as to I know I'm a DNS server over here in the west coast of the US but this user is actually coming from Hawaii or whatever or from the east coast and you should route them accordingly this is implemented by a bunch of content delivery networks and by a bunch of DNS providers it's I don't think it's actually a standards track standard but pretty much everyone implements it at this point or large providers do I have one minute left so what I'm going to do is I'm going to skip to where's the thing damn it there we go when you read these slides this is what you should take away do you care about capacity do you care about high availability do you care about utilization how much control do you want over traffic racing how much telemetry do you want do you can only care about network level telemetry or do you care about service level telemetry which of those you care about will in turn impact which of these technologies you selects there's a whole bunch of stuff at the application layer we haven't talked about again the the slide contents fully available and the other thing here is there is a bunch of reference material which will be on the slides which people can go and read up on in in detail so I think I'll see a publish the slides on the website do they yeah so I'll send that through to you and after the fact and that I think I'm done
Info
Channel: linux.conf.au
Views: 1,697
Rating: 4.75 out of 5
Keywords: lca, lca2019, #linux.conf.au#linux#foss#opensource, MuraliSuriar
Id: FC0DARpayhw
Channel Id: undefined
Length: 31min 9sec (1869 seconds)
Published: Tue Jan 22 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.