How to scale WebSockets to millions of connections

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
I have been going in depth on websockets lately peeling through the spec reading whole books on the subject and watching my own share of YouTube videos as well along the way I saw many people asking questions about the best way to scale websockets it's a very good question because as you can see websockets operates over a stateful long-lived connection which requires the server to type resources like memory and CPU for an indefinite period of time this is in contrast to a stateless protocol like HTTP where the connections are short-lived and therefore less taxing on the server in the short term and easier to scale in the long term so if you have a lot of web soccer connections or maybe you anticipate having a lot of websocket connections a server might start to hit the upper limits of the hardware causing performance and stability issues usually at the same time your app is starting to gain traction which is super frustrating we definitely don't want that that's why I'm consolidating all of the things I've learned about websockets and scaling web sockets into this one easy to follow video plus while this is not a tutorial per se meaning we won't be looking at a step by step or any specific Technologies you will learn the general approach and patterns used to scale websockets to thousands tens of thousands hundreds of thousands even millions of web socket Connections in the case of some companies I'm Alex with ably the real-time infrastructure API let's take a look at the challenges of scaling websockets now I think it's fair to assume that if you're watching this video you have a good grasp on websockets like how they work and the situations in which they Thrive I did want to point out at the very beginning of this video however the the websockets API is quite minimal and therefore efficient by default sometimes when you're looking online you'll see benchmarks that scale raw websockets scaling to millions of connections but this isn't totally realistic because you often need to do a lot of additional processing in order to take a websocket solution to production here's a list of some of the things like a heartbeats or buffering and delivered messages all these things can make websockets more resource hungry than some benchmarks make them seem another thing to consider from the very beginning which is actually easy to overlook is if you need a fallback option to an alternative transport like HTTP long polling in case the websocket connection is blocked by a firewall or some misconfigured proxy you can learn more about HTTP long polling on the ably blog but suffice it to say it's very widely compatible but it's also very inefficient that will put more tax on your server and besides scaling something like HTTP long polling which is just one of the available alternative transports requires a totally different approach to scaling websockets further adding to the complexity okay with those things out of the way which I think apply to any method of scaling websockets they're just parts of websockets and production let's take a closer look at your options for scaling websockets okay so generally there are two ways to scale any back-end servers there's vertical scaling sometimes called scaling up where you keep adding more power or resources to an existing machine more CPU more RAM that kind of thing and then there's horizontal scaling sometimes called scaling out where you add more servers to the network and a special piece of software called a load balancer shares the load evenly between them let's take a closer look at vertical scaling as it relates to websockets first a very natural question to ask and it's one I asked myself as well is how many websocket connections can one server practically support in production it's a good question because if that number is something you're comfortable with and you think it can handle your user base and your growth then that means you can just keep adding resources to a server and that's going to be a lot simpler than horizontal scaling by the way sadly there isn't a good question to that answer because it really does depend on your Observer resources but also your websocket implementation and your specific requirements as well besides even if that number was really high there are some practical considerations to do with vertical scaling that you should be aware of as well I've put together a little list we'll run for it quite quickly first of all vertical scaling can be expensive because you need to provision your server with enough resources to handle what you think will be your maximum number of concurrent connections even though the reality is you won't be operating in that capacity most of the time for example at night while most of your users are offline secondly even if you do provision a very powerful server which can be expensive by the way you probably are going to run into a resource limit somewhere a resource limit that requires expertise to overcome for example maybe you hit the limit of available threads or ephemeral pause before you have to start tinkering with kernel parameters it's possible and companies do this but it's not something I would feel comfortable doing as especially at the scale where this starts to become relevant in a production server when I'm not an expert at it and here's the thing when you have one websocket server in production it doesn't matter how powerful it is you also have one source of failure in production say you were tinkering with those options like I just described in production or more realistically you just want to upgrade the server do a redeployment or anything else fairly routine like that you need to be really cautious as not to interrupt the user's experience by temporarily taking the service offline while you perform that change or update in many cases this isn't really possible with a single server you need to kind of strategize about when you're going to make the updates you know you'll make them infrequently and during times where your app is less busy basically the antithesis of continuous deployments what is probably even more concerning is the chance of something catastrophic happening like maybe your code God forbid has an error where there's a memory leak that creeps up on you or some kind of vitamins an error that requires the service to be restarted perhaps your code is perfect and that's great but you're still dependent on a service provider that might temporarily go offline in the region or the availability Zone where your websocket server is deployed again it doesn't matter how well provisioned that server is it's still going to be vulnerable to this kind of risk maybe horizontal scaling can solve some of these problems let's take a closer look horizontal scaling is when you spread the load across multiple machines now you have the option to dynamically add or remove servers in response to changes in demand so when things get busy you can add servers to the array and during the Twilight hours for example you can remove servers and save some costs if one machine or server does go offline for any reason it might not be the end of the world because another machine in the array can likely pick up the workload a crucial part of the horizontal scaling equation is the load balancer which from a back-end architecture points of view sits in front of this array of machines and balances incoming websocket connections evenly between them there are a few load balancing algorithms to choose from here's some on screen but the default in many case is a round robin algorithm where the load balancer alternates you know server one server 2 server one server two server One servers you get the idea horizontal scaling is how you build a really scalable and robust back end but unfortunately it's not as simple as adding some servers and switching on a load balancer in fact the biggest downside of horizontal scaling is the architectural complexity it introduces okay I've got a little diagram here to help illustrate the points as you can see a load balancer can evenly distribute incoming connections between servers this means connections can get routed to different servers which is a good way to spread the load but a client's connected to one server will not receive messages sent to another and this is a problem so imagine you're building a chat application for example and the load balancer Roots user a to connect to server 1 and then in true round robin style user b gets connected to server 2. what if these two users want to communicate right user Ray might send a message through server 1 but the recipients user B is connected to server 2 and looking at this diagram there's no link between server 1 and server 2 server 2 has no idea what's happening on server one they've managed to distribute the load which is good but they're totally disconnected from a state's points of view therefore there's no opportunity for the message user a sends to actually get delivered to user B another way to kind of illustrate this point is with a broadcast type scenario say you want to broadcast a real-time update to every connected client in very basic simplistic terms it might be handy in your code to get a list of all the websocket connections then you can Loop over them and send the update to each clients but at the moment those connections are like fragmented across two servers so there's no single place you can go to get that list of Connections in order to do a broadcast the solution is to store the connection State out of process using a message broker for example many developers choose to use redis and the pub sub design pattern going back to the chat app example when user a sends a message through server 1 server one forwards the message to server 2 via redis creating a connection between these servers server 2 can then send the message to user B over the websocket connection in this case this is sometimes called adding a back plane to your infrastructure and it enables your servers to have a shared and synchronized view of these dates so horl's answer scaling is it the magical Silver Bullet that will no not at all there are some challenges of horizontal scaling as well the main challenge scaling websockets horizontally is to do with data synchronization as we've just covered I'll quickly add you also need to sync connection States so if user a goes offline user B can be notified and you know say you're building a chat application show that that user's gone offline let's not forget that now you're sharing application States in redis you have a new single source of failure another challenge to consider is how you replicate redis or whatever broker you're using there are alternatives out there to ensure that your system is robust in the case of failure something else that can be a burden for back-end developers is what to do when one of these servers in the array is reaching its Hardware limits to ensure a server doesn't become overloaded you need a mechanism to detect when the server is nearly at a limit reject incoming connections first of all is not to exasperate the issue and then you probably need to shed some existing connections as well which is where you force them to disconnect they will then try and reconnect to the service and hopefully make their way to a healthier back-end node this is a tricky mechanism to implement as I understand it but it also leads to another hairy type of problem around restoring connections because say you do shed connections from one server to the other or maybe one server just goes offline and in both cases now a bunch of connections can come thundering home hitting the same server at the same time this will put undue pressure on the server which can lead to kind of Errors connecting delay is connecting the servers of the server's performance can degrade improve increasing latency and you know maybe the server just goes offline at some point which is obviously no good you know all these problems can be solved but they do require a certain degree of complexity in this video we've spoken about the challenges scaling websockets many people understandably they ask how many websocket connections can one server support but it's kind of a mute question no two connections are the same first of all you know some connections are more active than others and besides having one server involves having a single point of failure that can make your application prone to downtime which can affect your users confidence in the service or even affect business continuity in some cases as you've seen in this video horizontal scaling is the key to building a reliable and highly scalable back-end websocket server but it does introduce complexity as you need to manage a load balancer introduce several servers and of course you need to find a way to synchronize States between those back-end servers as well so what's the conclusion here is horizontal scaling the go-to way to scale a back-end websocket server um maybe you know only you know what you're building and your specific requirements first of all what I've learned in both my research and conversations with Engineers like companies is that some companies do use one single vertically scaled server if you're running a powerful server with well optimized and fault tolerant websocket code it can handle a fair few connections there's no doubt about that but there is that very real risk of downtime maybe you feel that risk of downtime is worth it compared to burdening all of the complexity of scaling websockets horizontally at this stage in your application depending on what you're building and your own requirements equally there are going to be some businesses and maybe yours is one of them financial apps for example platforms and other types of services where the real-time experience is truly core to the main user flows where you just can't afford any downtime it's not an option and in that case it does make sense to invest more in horizontal scaling and setting that up in the first place I'm not here to say which is right for you just present the options and the challenges of course if you'd like to learn more about how to specifically scale websockets that's something I can help with as well comment below and if there's enough interest you'll see a video about that when you subscribe to the able YouTube channel
Info
Channel: Ably Realtime
Views: 14,349
Rating: undefined out of 5
Keywords: ably, realtime, websockets, mqtt, sse
Id: vXJsJ52vwAA
Channel Id: undefined
Length: 14min 0sec (840 seconds)
Published: Tue Aug 08 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.