Gismo for Ray: A Multi-Node Shared Memory Object Store That Accelerates Ray Workloads

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
my name is Charles uh I'm a co-founder and CEO of man Verge we have been developing big memory software that intend to make memory bigger faster and also to improve the inter-process communication across many servers and what I'm going to introducing is a shared memory SDK that can help accelerate a workload on top of Ray across multiple servers so the problem we're trying to solve is when the data needs to move between different processes and when these processes on the same node it is easy there are many IPC mechanisms within the same node but when they need to travel between different nodes IO is incurred and there are many copies there are serialization deserialization and in the case when you need to use network storage the storage media speed also slow down the application so whenever the application needs to move from a single node to multiple node it inevitably slow down because of the bottleneck imposed by IO and they were really not good solution to this until the emergence of a new standard called cxl cxl stands for compute express link it's a new memory protocol that runs on pcie Gen 5 that allows a memory to be extended beyond the DDR memory that's plugged into the dim slots it can be added within a server it can also be available as a memory Appliance that's connected to multiple servers so what this allows multiple nodes to do you have different processes can access the same memory with the low store mechanism that memory supports very with very low latency and very high bandwidth without going through a network allow data to be shared across multiple servers and what we did is we developed a new software layer we called Gizmo and Gizmo stands for Global IO free sheer memory object it presents a object storage like API but those objects will be memory mapped and accessible from your applications and we would coordinate among the different nodes and making sure the access are consistent correct and cash coherent and we um we present a very simple API where you can connect to our Gizmo server and start to be able to write and read to the Gizmo store and the key difference here from what Ray already includes and Ray has a great architecture it has a shared memory object store within however it's only very fast when the object is placed locally if the object is on another server what rate did underneath is it needs to transport that object from the other node to the local node to make the local copy before you can access it and with our solution you do not need to do that copying and you do not need to incur the serialization and deserialization because the network i o but instead you can directly create those objects and allow those objects to be accessed anywhere and we can ensure the consistency behind the scenes and so this is the really the Improvement we can add to the uh to the ray system and so now I'm going to show a live a demo in a video and and this is uh essentially running a four node right each of them have four CPUs and we're gonna show the Baseline uh of uh of this uh running a shuffle Benchmark on this four-door Ray and we're gonna show them with our kiss ball integrated with Ray and how does the performance compare over the last few months we collaborated with the ray community and with the any scale team to do the integration and to be able to validate the performance Improvement that a cross note sheer memory can present so here is the the demo the text is a little small so I'm going to try to read them out so first let's see how many nodes are here and this is writing the Baseline we have a full note each of them with 16 gigabyte of memory and we're going to run a shuffle Benchmark with about 20 Gigabytes to be shuffled across these four nodes and this is using the original array with its built-in uh in-memory Object Store and uh as you see it's taking a little while for these Shuffle jobs to complete and let's give it a few more seconds and as you can see here it took about 57 seconds for this Benchmark to complete now let's uh take a look at uh kissable now with Gizmo integrated into Ray again the same four notes now they are connected to the sheer memory so the shuffle can happen over the sheer memory and it's the same 20 gigabyte data sites and you can see the setup time is about the same but running time becomes much faster so the difference here the new time is about 30 seconds including the initial setup time so it's about half the time it spent using Gizmo if you take out the set setup time the Improvement is actually even more so just to show more complete results that we were able to achieve by integrating uh Gizmo into Ray to getting a local object one gigabytes object compared to Baseline and Gizmo it's the same which is a great result because here the objects are placed in a shared memory instead of local memory but it actually has no performance degradation and when the object is placed on a remote node it's about seven times faster instead of uh you know 2.7 seconds we are also having the same speed as as you're accessing a local object and the next line is the demo we have shown shuffling 20 Gigabytes of object uh that we we were able to achieve about half the time and then to shuffle a bigger object the Improvement is even more we can have 2.8 times faster performance and and these benefits because of the following reasons number one because of uh Gizmo is using Shear memory access it's not doing the network i o it's not doing the serialization deserialization therefore the transport is much faster there will be much less copies that you don't need to copy from A to B and all the intermediate copies you need to go go through the stack and also it's much more efficient use of the memory you know before when you have a distributed uh in-memory store you can have multiple copies of your object because of the different objects you're all accessing it so you multiply the usage of memory now you only need to have one copy for all of the nodes therefore you get more bang for the buck and it's less likely for you to run out of memory and speed the disk therefore the speed is much faster and as we mentioned we are working with the any scale team and with the ray Community we just submitted a enhancement a proposal to incorporate a plugable object store into the array architecture so that kissable or other object store can be plugged in very easily and you can see the details of this enhancement proposal from this link and we have uh other Improvement coming as well offering a bigger memory to the rayform framework in addition to the sheer memory where we can improve the internode communication that we can have endless memory that can dynamically add more memory capacity to your node so even when you have data skew situations you do not run out of memory and run into those om errors and we can also dynamically scale the memory bandwidth that's often a bottleneck for various AI workload so if you have more questions email us at Gizmo and members.com and we also have a boost here over there so we'll be happy to answer your questions thank you very much [Applause]
Info
Channel: Anyscale
Views: 567
Rating: undefined out of 5
Keywords:
Id: sJmM47ce48I
Channel Id: undefined
Length: 8min 57sec (537 seconds)
Published: Thu Oct 12 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.