[EN] Don't Wait For Me! Scalable Concurrency for Ruby 3! / Samuel Williams @ioquatix

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

Looks exciting. Happy to see Ruby will get considerably faster. I'm not completely sold on the transparent syntax though. For example, in Node, I like to know when a function is async and when it isn't. It's a very easy abstraction to understand. Fibers and Scheduler seems more complex, if you ever have to deal with it, and you might not even know which functions are async and which aren't. But I guess time will tell :)

Overall I think it's a step in the right direction. Can't wait to have this integrated into Rails. Having it in RSpec and ActiveRecord will be the biggest potential game changers IMO.

👍︎︎ 3 👤︎︎ u/fedekun 📅︎︎ Oct 16 2020 🗫︎ replies
👍︎︎ 1 👤︎︎ u/schneems 📅︎︎ Oct 16 2020 🗫︎ replies
Captions
hello and welcome to my talk don't wait for me scalable concurrency for ruby 3. today we are going to talk about ruby scalability what it is and how it impacts your applications then we are going to look at the new ruby fibre scheduler how it is implemented and how it can improve existing applications with minimal changes so what makes ruby a great language ruby has many good qualities but i wanted to know what other people thought so naturally i asked twitter and here are the results syntax was the most important followed by semantics libraries and finally scalability someone also voted for maths thank you matt for creating ruby and supporting my work so looking at these results we might wonder is scalability important why did it come last many companies are handling large amounts of traffic with ruby and yet it looks like scalability might be a weak point before we go any further we should discuss what is scalability so we have a clear idea of the problems we are trying to solve and how we can make ruby better in this regard scalability is fundamentally about the efficiency of a system and how many resources are required as the size of the problem increases even on this slide we can compare the efficiency of two systems english and kanji english requires six times more characters to say the same thing which is less efficient so as we increase the size of our system efficiency issues manifest in many different ways one big issue is resource utilization how much time we spend processing versus waiting if we do not efficiently utilize the hardware our scalability will be worse and our running costs will be higher cloud providers love it when you build inefficient systems because they make lots of money but we want to avoid this situation to make our business growth more cost effective modern processes are so fast that they often spend a lot of time waiting on memory storage and network i o in order to utilize the hardware more efficiently we can time share between multiple tasks when one task is waiting another can execute this is one way to improve scalability typical operating systems use processors and threads to execute tasks threads are very similar to processors but they have reduced isolation and fault tolerance because they share memory and address space a single hardware processor can context switch between these tasks in order to complete all the required work as quickly as possible we can add more hardware processes to increase the scale of a system this allows multiple tasks to execute simultaneously in the case that one task is waiting the processor can move to another task but shifting between tasks is not free context switching as it's called is a relatively expensive operation at the system level and as processors dance around between tasks the overhead of switching can become significant we refer to the simultaneous execution as parallelism and it is an important part of any scalable system however it also introduces significant complexities including non-determinism race conditions and fault tolerance application code should not need to deal with these complexities and it has been shown time and time again that thread safety issues are a significant source of bugs but when used carefully isolated parallelism provides the foundation for scalable systems another way to improve hardware utilization is to interleave tasks there are several ways to do this callbacks promises and fibers when the application code would execute a blocking operation the state is saved instead of waiting some other task will be executed and when the application code is ready to process again it will be resumed this allows us to combine tasks back to back in a single process we can minimize the system level context switch overhead by implementing an event loop which can schedule work more efficiently we refer to this interleaved execution as concurrency and it is an important tool for highly scalable systems threads and processes are designed to expose hardware level parallelism which is typically limited by the number of processor cores in your system in practice process or thread per connection implies an upper bound of 10 to 100 connections per system which is simply not practical if you want to build highly scalable applications how does scalability impact cost a lack of scalability in an application can cost the business not only does it increase your hardware cost but it can impact your user experience if systems become overloaded in addition you may find you are building more complex distributed systems to account for a lack of scalability in your core application these designs bring many hidden costs including operational and maintenance overheads and make it hard to move on new opportunities because of scalability dead efficient scalable systems are great for the planet less computers are required for the same work the technology sector is responsible for about two percent of global carbon emissions and as a civilization we are facing an unprecedented set of challenges relating to climate change we can help by improving the scalability of our code to reduce natural resource consumption and energy utilization our code can help to build more efficient systems for society i believe this is our responsibility as software engineers is ruby fast enough a huge amount of effort goes into ruby including significant improvements to the interpreter garbage collection and just-in-time compilation yet we are still trying to improve performance let's consider this hypothetical request which includes several blocking operations i have split it into 10 milliseconds of processing time and 190 milliseconds of waiting time the total time is 200 milliseconds this is a very typical situation here is an example of a real ruby web application which has a similar profile at its peak 25 milliseconds is spent executing ruby code and 150 milliseconds is spent waiting on redis and external http requests so what happens if we make ruby 10 times faster in our hypothetical example we reduce the time spent processing the request from 10 milliseconds to 1 millisecond that's a 10 times improvement but the total time is still dominated by weighting and that hasn't changed what about if we could make ruby execution a hundred times faster surely that's got to help here is the result it should be obvious now even if we improve the performance of ruby there is still so much time being wasted by waiting so how do we make ruby faster what can we do to avoid waiting so long let's take our example of a single request and look at it in the context of a small program here is our single request and in our application it's one of many this kind of sequential processing happens everywhere on clients interacting with multiple remote systems or fetching multiple records or on an application server handling many discrete incoming requests what if we could change this so while we were waiting for one request to complete we could start on another while the latency of the individual requests cannot be changed we can greatly reduce the total latency of all the requests together and improve the application's throughput so how do we build this let's take a look at how we need to change the code let's start from our sequential program firstly we are going to add a worker to wrap our request so we can represent individual jobs that can run concurrently now let's expand what a request looks like first we are going to make a connection to the service then we are going to write a request finally we are going to read the response okay let's make our read operation non-blocking firstly we are going to change red to read non-block then we need to check the result and if the data was not available we will get the response weight readable in that case we will use a special system call io.select which waits until the data is available this is the operation which accounts for the waiting time in our program rather than waiting we can use another special operation called yield which passes control out of the current worker and allows another worker to execute we also store the current worker into a waiting table indexed by the i o we are waiting on finally we implement an event loop which waits for any io to become ready using io.select and then resumes the matching worker so it can continue processing the response congratulations this is the entire implementation of non-blocking event driven i o for ruby as you can see the implementation is a bit verbose we don't want developers to have to rewrite their code like this because it exposes a lot of details that they shouldn't need to worry about in fact it's my belief that you should not need to rewrite code at all scalability should be a property of the runtime system not your application code let's rework this code and hide some of the details behind a tidy interface firstly let's encapsulate the waiting list into a scheduler object this scheduler will provide the interface required to wait on i o and other blocking operations including sleep we will use it to hide all the details of the event loop and underlying operating system next let's move the details of waiting on a specific io into the scheduler implementation there is already a standard method for doing this called weight readable this allows the scheduler to handle the scheduling of individual tasks without the user needing to explicitly yield control or manage the waiting list as these are usually implementation dependent details finally we move the entire event loop into the scheduler implementation and replace it with a single entry point run because some operating systems have very specific requirements on how this loop is implemented putting it into the scheduler improves the flexibility of the user code to work on different platforms as i said earlier i don't believe we should need to rewrite existing code to take advantage of the scheduler other programming languages require explicit syntax such as async or wait in order to do this but we can take advantage of ruby's fibers to avoid this explicit syntax and hide the await in the method implementation in our code we expanded the read operation to become non-blocking but this detail can be hidden within the implementation of ruby itself let's revert the implementation back to the original blocking read we still need to call into the scheduler how can we do this in our implementation scheduler is a local variable however we introduce a new special threadlocal variable to store the current scheduler the threadlocal variable allows us to pass the scheduler as an implicit argument to methods invoked on the same thread the ruby implementation of i o read is largely the same as our non-blocking implementation earlier and internally it invokes rb io weight readable it is in this method that we add a little bit of magic at the start of this method we check if a thread local scheduler is set and if it is we defer to its implementation for weight readable this allows all blocking io operations to be redirected to the scheduler if it is defined and without any changes to application code finally we have not revealed the implementation of wercker it is in fact nothing more than a ruby fiber while the practical implementation has a few more internal details this is essentially the entire implementation of non-blocking event-driven io for ruby the original concept for this design was started in 2017 and finally in 2020 after testing the ideas extensively we made a formal proposal for ruby 3. this design specifies the scheduler interface which can work across different interpreters including seat ruby jruby and truffle ruby and how the existing blocking operations should be redirected the actual implementation of the scheduler can be provided by different underlying implementations which allows for great flexibility this proposal included a working implementation for c ruby and it was recently approved by mats for experimentation in the master branch this represents a great milestone for the project we hope to iterate on the current proposal and deliver a solid implementation for ruby 3. how do we support existing web applications fundamentally the scheduler is designed to require minimal changes to application code however we still need to provide the model for executing application code concurrently in addition not all blocking operations are due to i o this is especially true for the current database gems which sometimes hide the i o operations even though there are challenges because this is a very exciting development we found existing maintainers were willing to start making the necessary improvements for applications that want concurrency today you can use async the core abstraction of async is the asynchronous task which provides fibre-based concurrency we introduce new non-blocking event-driven wrappers with the same interface as ruby's io classes in many cases you can inject them into existing code and things will just work without changes required in async tasks execute predictably from top to bottom so no special keywords or changes to program logic is required in the case of ruby 2 you must use the async wrappers but in ruby 3 the scheduler interface allows us to hook into ruby's blocking operations without any changes to application code at the core of async is the non-blocking reactor which implements the event loop it supports a variety of blocking operations including io timers cures and semaphores blocking operations yield control back to the reactor which schedules other tasks until the operation can continue the async gem is less than 1 000 lines of code with good unit test coverage we felt it was very important to reach a stable 1.0 release with a well-defined interface along with a good test suite because this provides a strong foundation for reliable applications lately i've been looking into async as one of my projects would really benefit from non-blocking io it is really beautifully designed janko moronic on the general design of async async has a growing ecosystem of scalable components including support for postgres redis http and web sockets one of our goals is to bring high quality implementations of common network protocols to ruby with focus on both consistency and scalability almost all the components are tested for performance regressions because the ruby 3 scheduler interface has been merged we now have experimental support for their interface in async once this interface is finalized you will be able to make existing rubric io non-blocking when used within an async task right now it's experimental but we hope to release it soon if you write existing code with async you will not need to make any changes to see improved concurrency with ruby 3. one of the most critical parts for many businesses using ruby is the web application server we built falcon as a scalability proof of concept and it was so successful that we've continued to develop it into a mature application server for ruby web applications falcon is implemented in pure ruby yet in performance it rivals other servers which have native components this reinforces our original assumptions about the diminishing returns of improving the ruby interpreter out of the box it supports http 1 http 2 with zero configuration this gives you a modern localhost development environment it can also be used in production without nginx it has a really great model for websockets and we've recently demonstrated a single instance with one million active websocket connections in order to reach this level of scalability it uses one fiber per request which means that your application requests will be scheduled concurrently if they do blocking operations for example database queries web requests redis and so on to achieve parallelism across multiple cpu cores falcon uses a hybrid multi-process multi-thread model so we can support c ruby j ruby and truffle ruby seamlessly so is it scalable well like anything it depends on how you use it but in various micro benchmarks we have shown that it can perform very well especially when there are operations which are non-blocking with falcon but blocking on other systems this graph shows perma verse falcon permit plateaus at 16 workers while falcon is able to continue scaling up until all cpu cores are pegged we thought this was a great result especially if your company is dealing with a higher volume of traffic async is the right model because web applications are almost always i o bound the ruby web ecosystem is really lacking in scalability for example websockets on perma async unlocks the next tier of scalability in the most ruby-like way possible brian powell on migrating from perma to falcon so the big question does it work with rails yes it's rack compatible however there are some caveats which we will discuss next activerecord is a commonly used gem for accessing databases unfortunately activerecord currently uses one connection per thread because of this multiple independent fibers on the same thread will be using the same underlying connection this can cause problems because multiple fibers could incorrectly share a transaction this is kind of a design fault of active record and there is discussion happening right now about how we can improve the situation i think ruby 3 will drive these changes as the model for event driven concurrency becomes clearer another issue is that database drivers don't use native ruby io and are instead usually opaque c interfaces however the maintainer of the postgres gem is interested to add support for non-blocking io i believe we will see expanded support from other maintainers once ruby 3 concurrency model is stabilized these changes should percolate up into the higher level libraries and hopefully drive adoption as an alternative we recently started implementing a low-level database interface gem called db it currently provides non-blocking postgres and mariadb mysql access while we believe the existing gems may eventually adopt the event-driven model we didn't want to wait for our own code redis is another commonly used system for fast access to in-memory data structures it is commonly used for job cues caches and other similar tasks we introduced a pr to support event driven i o in redis rb but unfortunately it was rejected so we have continued to maintain the pure ruby async redis which is almost as fast as the native high redis implementation one important point about the async gems is that they are guaranteed to be safe in a concurrent context in our research we found that many gems are not thread safe including up until recently the reddest gem http is a key protocol of the modern web as such we provide a gem async http which has full client and server support for http 1 and http 2 and soon we hope http 3. this gem fully supports persistent connections and http 2 multiplexing in a completely transparent way that provides great performance out of the box when combined with falcon we recently added support for faraday with the async http faradayadapter and we tried it out on several projects the github changelog generator makes many requests to github to fetch the details about a project while we didn't work on all areas of the code we saw a significant performance improvement by fetching commits in pull requests concurrently for bigger projects the improvements brought about by multiplexing and persistent connections are even more significant in another script a traditional http jam was compared to async http faraday when fetching the job status of a large build matrix from travis i think it was about 60 to 70 requests in total because those requests could be multiplexed on a single connection the overhead was significantly reduced rather than making a large number of requests sequentially all the requests could be made on a single connection it's these kinds of results that make me personally really excited about the future of async falcon and non-blocking event-driven ruby as a bit of fun i found this interesting experiment with the ffi gem which we use for implementing the db drivers it moves ffr calls to a background thread and then yields control back to the scheduler while it's pretty risky in general in some cases this can be a valid approach and offloads otherwise blocking c functions to background threads so what are the next steps it depends on you the community please try these new gems and features of ruby and give feedback the best thing you can do is have fun with these new projects i hope they make scalability exciting and the businesses please support this important work we recently introduced a paid model for falcon if you are interested in this please reach out we want to support the ruby ecosystem and help it continue to grow especially in relation to scalability if this is important for your business you can help us develop it further by investing financially thank you for listening to my talk you
Info
Channel: RubyKaigi
Views: 8,199
Rating: undefined out of 5
Keywords: Ruby, RubyKaigi, RubyKaigi2020, RubyKaigi2020Takeout
Id: Y29SSOS4UOc
Channel Id: undefined
Length: 29min 17sec (1757 seconds)
Published: Tue Sep 08 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.