Rust at speed — building a fast concurrent database

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Hey Jon!

I spoke about Rust and Postgres at this exact venue just a few months ago, organized through the RustNYC community though. It bums me out that I wasn't aware of your talk about Noria because I would have definitely gone (not sure whether this was a public event?).

You're generally not in the NYC area, right?

Is anyone working on a postgres adapter for Noria yet?

👍︎︎ 29 👤︎︎ u/darin_gordon 📅︎︎ Jan 05 2019 🗫︎ replies

I really like the idea of maintaining the materialized views live.

I still remember a very simple task that I just couldn't manage to get to work with acceptable performance on Oracle (v11?): a simple, frequently updated, table folders contains essentially 4 columns: folderId, actedOn, messageId, publicationTimestamp.

How fast is select count(*) from folders where folderId = ? and actedOn = 'N';, with an index on folderId/actedOn? It's O(N) in the number of rows matching.

Maintaining a folderId <-> count was impractical: it caused high contention on the most popular folders. I never got to attempt the idea of a folderId <-> count with 10 or 20 rows per folder, using randomized distribution of the increment/decrement to reduce contention by taking advantage of row-level locking; in the end we just axed the idea of an accurate count for folders with more than 1,000 messages and would display 1,000+ instead.

So... how do you think Noria would manage on this benchmark:

A constant stream of updates inserting/removing messages in the folders table with folderId X and actedOn 'N'.
A constant stream of updates flipping actedOn from N to Y (never in the other direction).
How fast can you get select count(*) from folders where folderId = ? and actedOn = 'N'; to be for folderId X?

(Assuming MVCC, if the count returns 3 and we select the messages, we want 3 of them, not 2, not 4)

👍︎︎ 14 👤︎︎ u/matthieum 📅︎︎ Jan 05 2019 🗫︎ replies

Great Talk. Just wanted to say that you have found a great way to structure it. Its not easy to "squish" that much information in such a short amount of time. It was really helpful and i am eager to read the paper now. Turns out the evmap was also very interesting to me because i don't think there is much choice of concurrent hashmaps in rust currently – at least i think that is the case. May i take the opportunity to ask if there is a way in evmap to have an insert followed by a refresh in such i way i can't forget to call both like in a insert_refresh method? I see many benchmarks use this pattern and i think you talked about it in your talk as you compared noria to other databases that this was the pattern you used for comparison. I think it would just be more convenient. I couldn't find it in docs. Another thing that could be quite useful with the way you structured evmap is something like transactions for maps. Having something like a discard method that would discard every inserts(and updates etc.) to the point of last refresh (so basically a copy of the current read map) sounds like a useful thing to have given the way evmap works. Anyway Thanks for this talk, your streams (especially the async/tokio one) and the crates and overall work you've done!

👍︎︎ 11 👤︎︎ u/asmx85 📅︎︎ Jan 05 2019 🗫︎ replies

Good video, looking more into this DB 🤔

👍︎︎ 6 👤︎︎ u/jovonni 📅︎︎ Jan 05 2019 🗫︎ replies

Oh hey, it's you! Really pleasant talk, thanks! I wrote a small program using fantoccini just yesterday, so thanks a lot for that library too. The docs helped a ton.

👍︎︎ 7 👤︎︎ u/pwnedary 📅︎︎ Jan 05 2019 🗫︎ replies

Holy shit. That double map / deferred write is pretty cool and easy to understand.

I'm not really a computer scientist or programmer but i'd like to ask if this idea is the same as read-copy-update exclusion i've heard about in the linux kernel?

👍︎︎ 4 👤︎︎ u/SCO_1 📅︎︎ Jan 05 2019 🗫︎ replies

Hi, I really like your work Jon and I believe I saw it in the past on reddit.

Firstly, the adaptive element of this is amazing and the cache principle is very good compared to memcache/cache-stampede/cache-invalidation of the past.

However, as a DBA/data-performance-engineer, you cannot really compare to an actual materialised view that someone took the time to data model well.

If I setup a manual materialised view on MySQL (with a refresh per hour/5min or trigger based), it might actually beat those 14million reads. Maybe if I throw in ProxySQL in there with a TTL cache, it could.

Not only that, but you're half way to setting up a data mart/data warehouse as well.

I am regretful that MySQL and to a large degree Postgres, does not really have materialised views like Oracle and MS SQL. I believe that had it done so, we would have not seen 2/3 of all the middleware/caching we are dependant on today in the open-source world.

Hopefully, one day someone comes up with a Rust-based database that has fast materialised views. And they will probably use your evmaps.

👍︎︎ 6 👤︎︎ u/tkyjonathan 📅︎︎ Jan 05 2019 🗫︎ replies

That was an amazing talk! And as always with your Rust content, I found myself learning stuff once again -- you're great at explaining complicated things.

👍︎︎ 3 👤︎︎ u/Hexjelly 📅︎︎ Jan 06 2019 🗫︎ replies

Oh, and for additional context, this is the original Reddit post for evmap, and this is the one for Noria.

👍︎︎ 3 👤︎︎ u/Jonhoo 📅︎︎ Jan 06 2019 🗫︎ replies

Captions

hi welcome everybody I'm Larry Rudolph from the labs team part of the goal the Lapps team is to reach out to academia and this is a special talk I met John at csail when we were when Trevor and I were there talking about security of firmware and we had this idea that we should use rust and so we met so there's the MIT rest expert or the Northeast rustic spirit or the worldwide rust expert I don't know which one call yourself with but his research is also on databases novel ways databases and coordination and conflicts and stuff that I really like so when inviting to give a talk I said you have to talk about both things and John said no problem I guess he comes here from I guess you had undergrad in Australia and then you went to London and I was from Norway you know he went the wrong way and then finally sort of came back up and ended up at MIT and he's a grad student at my Tec cell but let's so he's going to give a talk and then there's be a deep dive or extended Q&A if you want to stay and ask but let's make him welcome thank you so I'm John thanks for the introduction I'm here to talk to you about rust in general and in particular about a research system that I've been building at MIT for the past three years it is essentially a fast concurrent database that tries to get a database is slightly differently from how we've been designing them in the past and I part of what I want to impart is what it's like to use rust to build something that's sort of large and and real in some sense because I think that's a lot of the reservations that people have about rust so I'm John I'm a PhD student at the MIT parallel and distributed operating systems group I work on a database called nuria that's now about sixty thousand lines of russ code that we've been developing over three years I also maintain several open source rust libraries I work a lot on the asynchronous i/o part of rust I've done about 30 hours or so of live rust coding stream so you can find online if you're interested and so if you want to look any of that then you can find me online at any of those URLs or just shoot Larry and email he will forward you to me I'm sure he will do that correctly as for this talk there are some things we're gonna cover and some things we're not gonna cover and I think it's useful to just set that up in advance in particular I'm going to look at rust from a relatively high level I'm not gonna dive into like the syntax and code I am gonna talk a little bit about rust ownership because it's one of the more novel parts of rust one of the things that a lot of newcomers to rust struggle with but I will not sort of go deep into any of that we can do do that in the deep dive afterwards I'll also give the overall architecture of nuria this new database and in particular dive into one of the interesting data structures we have to design for nuria that hopefully will appeal to the engineers in the room which is I assume basically all of you I will not talk a lot about sort of query execution query planning deep database stuff if you want to read any more about that you should look at the paper or come talk to me after in general the idea of this talk is to look at higher level aspects of using Russ for real and so therefore the first thing we're gonna look at is what is rust like why am I talking to you about rust at all well rust is a programming language by Mozilla it's relatively new it reached 1.0 at about four years ago and it's for systems programming which is a very vague term but the basic idea behind systems programming is you want to have a language that lets you build systems that do stuff so they're not necessarily just sort of tools or command-line utilities but they're for writing like servers or clients or basically anything as a system so it's very simply poorly defined but the RUS team likes to use a bunch of these sort of quotes that define what the language does one of the ones they're relatively well-known for is this fast reliable productive pick three and the idea is that in a lot of existing production languages yours are forced into this I have to choose away something I have to choose to not get a particular feature like in C you don't get save code but you get really high performance and Russ wants to get a do away with those trade-offs instead saying let's try to do all of them in particular Russ takes this approach with concurrency so in rust they have the slogan of fear concurrency you should be able to write code that is concurrent without worrying that you have data races than your code is gonna crash all the time the Russ language is also very community driven so it is entirely open source everything is run on github there's an RFC process for extended the language for sending the compiler and I have several things that are now in the compiler available in the language and you can - and you can get involved in that process and that is one thing that is really neat about this language as opposed to many other languages Russ is on a six-week release cycles and new things also get released fairly quickly now you might wonder well how is referred a lot of there are other programming languages out there like why are we not all using Haskell or oh camel or go for that matter what is different about rust well the first thing is rust is a really powerful type system you can think of this in terms of sort of more like Haskell than it is like C this turns out to let you express constraints in your code a lot better than you can do if you don't have good types you get zero cost abstractions so the idea is that you can have abstraction layers you can have things like interfaces but the code want amorphous is so that the code is still fast you get the same kind of speed as C you have the borrow checker which tries to guarantee that you have no data races at runtime so the compiler will check that you have no data bases in your code and the tooling of Russell is just really really excellent it hasn't actually taken the time to learn from the past sort of n decades of programming language development and infrastructure development and incorporates that into the language so what do you what would you use for us for well Russ is really good for writing low level systems code so the sort of space is traditionally occupied by C or C++ and part of the reason for this is because as opposed to many other languages it explicitly tries to be interoperable with C and C++ that means that if you have existing libraries you have in existing systems that are built using those languages you can take rust and you can swap out just small parts of it and still have it work well with the existing software that you have and in fact this is why Mozilla developers in the first place it was sort of the firefox browser to be able to swap out components of it with something that was highly concurrent and written in Russell they knew that it wouldn't crash but you can also write high-level software in rust so rust because of its type system is quite pleasant to work with and you will find that you can actually write code that looks almost like Python code except that it is also fast and also like the compiler will check if you have an incorrect type as opposed to you know crash at runtime in fact I've found that I write both high level systems code sort of network services servers like I wouldn't go and I can write tools and infrastructure like command line utilities and rust and feel perfectly happy doing so but let's dive into that's like why specifically I've talked in very vague terms let's look at some specifics so first of all and this is an important point in rust lots of concurrency bugs are impossible the compiler will not let you compile a program that has a data race in it and if you come from basically any other language this is very surprising right in C you will write a program and you expect it to crash at runtime the first few times do you run it this is not the case with rusts code it can crash at runtime for sure but there are lots of these problems that just will not be there you also no longer have things like use after free or double free the entire ownership system guarantees is also at compile time the type system provides things like enums and you might think oh I know about enums like we have younam's in other languages in rust enums or what known as abstract algebraic data types and algebraic data types are basically enums that can contain other stuff in particular for error handling rust uses the result enum the result enum is generic over two parameters the value that is contained if you get an okay and the value that is contained if you get an error so if you think of writing go code for example you often return this sort of couple of the value if everything went correctly and an error and you need to know like if error is nil then you do something otherwise do something else or you might like return a null pointer or something like that if something went wrong in rust you just return a result that you the compiler forces you to check is either okay and then it will give the value inside that okay where it's an error and it will give you the value of that error so this means that it forces you to explicitly deal with things like errors null pointers as well are encapsulated in enums is an option type that is either none or its that contains the value that it is actually pointing to all of these also come at zero costs at runtime and finally rust has this thing called pattern matching which is survey you can think of it as the switch statement but on steroids unless you do things like if this is let's say this computation succeeded and also the value it gave was between 4 and 17 and also this other value is the string foo you can do really advanced pattern matching which helps write sort of code that feels more Python s core even Java Script e then it feels like low-level C code and I talked about the tooling a little bit so Russ says dependency management built-in so you can think of things like NPM for JavaScript or pi PI for python or even for go things like go get except it's much more advanced you can do dependencies that are path dependencies that are gift dependencies that are dependencies on the central library infrastructure that rust has and all of it is just built into the default tooling that you get and works really well the documentation is built into the language for every library every file you write if you take a function a module even a constant you can prefix it with a comment with three slashes instead of two now that is documentation if you write code in that string it will be automatically turned into a test so the your documentation cannot be out of date in any code samples that it has all of this is integrated in local language any library that you upload to the repository of libraries that rust has all of that code is automatically documented and interlinked so that any documentation you look up will just always look the same it will always be interact with other crates is actually fabulous to use and I do not use that term lightly and finally and I mentioned this a little bit when it comes to C and C++ it is really easy to do F of I so foreign function calls whether that is calling into a c library is C++ library there's interoperability with Python with Ruby with go that you can make calls into those languages or of those languages call interest very very easily and that means that it's easy to integrate into existing code base now all that said there are some things that Russ is not good for and I think it's important to highlight those in part because the language is improving in these regards but also because it's worthwhile to know the first of these is that Russ is not yet great for highly specialized systems and by that I mean if you have something like Intel DPD K or very highly vendored libraries that are large and expansive and written in some other language it is not trivial to just sort of start using from Rosland you couldn't take DP DK and generate sort of bindings and rust for all of the seed libraries they give you but it would be a lot of work and if no one else has done that work yet that's a lot of overhead you have to do and that you might not want to maintain that is getting there slowly but surely the people are developing really good infrastructure for DP DK or for Amazon ec2 or for whatever other sort of large project that exists out there today but it is slowly getting there the other thing is Russ is not great for rapid prototyping if you're just kind of like do a quick and dirty write something up just you're just gonna run at one so you don't really care if it crashes if you just want to write something and not worry about the details Russ is an awful language for that job because Russ will force you to write correct code not always it will not always be correct but it is pretty unrelenting about if you're if you're gonna write code and it's gonna compile it shall not have databases like you are not allowed and the type system is fairly involved like it forces you to deal with errors and force you to do with the case when things are null and that's sort of contrary to the idea of rapid prototyping linear kernel programming also a little bit finicky with Russ currently because you don't have things like Lib C the standard library it is possible some people have done it but writing kernel modules and Russ still requires a bit more work GUI programming because gree programming is hard and they're there bindings to gtk and QT and win32 even for Mac OS there are a bunch of bindings for the the UI libraries there but the bindings are still pretty young in part because the language is young too an asynchronous networking is one thing that Russ is doing a lot of working currently on building good primitives for dealing with asynchronous computation in general and i/o in particular the language is taking a pretty interesting approach that looks a little bit like promises in JavaScript and it works pretty well I've been using that in noria in the database I've been building at its highly performant but the code can get a little bit nasty to write because not all the high-level distractions are in place yet and so that is also some where the language is still developing okay so let's get to the database part of this talk we've talked a little bit about what rust is and I'm gonna switch sort of sites entirely and talk about databases for a second but hopefully something that you find interesting so nori has this project I've been working on for a while and the basic observation is that you have a bunch of applications that issue queries like this this is extremely common that you just have your application issue lots of sequel queries in this case this is it as a story's table and a votes table you count up the votes for every story and you want to extract the story title and the vote count for some specific story all well and good what this ends up being in a traditional relational database is it turns into something like this you have a front-end that issues that query to the back-end database the database does some query planning and decides that oh in order to execute this I have to count the things in boats and then I have to join it with stories and I'm gonna do a filter and let me give the results back this is all very well and good this is what basically every database out there today does the problem is for most applications almost all of the stuff that you do is read almost all your operations are selects and the selects in a database are pretty expensive because every time you do a read you have to redo this computation so even if the data that's underlying you hasn't changed you have to do all this stuff on every read and if 90% of your requests are reads that is a lot of wasted cycles and we don't like to waste cycles so some of you might think ah we solve this we're just gonna put a cache there right and caching solves everything like we're just gonna stick memcache key or Redis or something in front of our database and who everything is fast but unfortunately it's not quite that simple in fact your application is more likely to look like this where you have to make sure that like every time you write you invalidate the appropriate you don't eval aday to much because then you throw your entire cache away and now when you do a read you might miss you have to go to the database you can run into really weird situations that many people don't even think about like if you have a skewed distribution of popularity in your database then it might see a lot of rights and a lot of reads you get a right for a popular key you invalidated all of the reads now start missing because they were all for that popular key all of them go to your database at the same time in your database falls over and this is known as thundering herd and it requires a lot of really sophisticated mitigation to get around there's a facebook paper on how Facebook's tries to manage their my sequel and memcache D caches and runs into this problem and the solutions are highly non-trivial all of this seems like so much complexity in order to make your application fast so what we did in Oriya was essentially observed that all of this is stupid the database has your queries so why can't the database just maintain the cache wouldn't that be a lot easier so nuria instead of doing all of that business does it a regular sort of query planning like you do in traditional databases but instead of executing the query on read it execute the query on write and it just so incrementally so every time a new every time a new write comes in its gonna look at all the queries the application has given it in the past and it's gonna basically take the existing query results for that query take the new write process it through the query plan and update the cache results in place this may seem a little strange so let's do a quick walkthrough of that so here's the same query that we looked at earlier but notice that there's this results thing at the bottom which is that what's known as a materialized view you can think of it as your cache it stores the current results for that query and you see that the currently for some key a the story title is foo in the vote count is 41 now a new vote comes in for story a it enters the this is a data flow graph that basically represents the query that we're executing the vote for a comes into the count operator the count operator goes oh the previous count was 41 the new count is 42 so it emits a record that says a is now 42 instead of where it was 41 that record then hits the join operator the joint operator goes oh I got a thing for a I'm gonna look it up his stories and find the title so it time for words a is 42 and the title is fou that then enters the cash and the cash goes oh I have a key for a already I'm gonna replace it with the new value for a so now is 42 so now any at this point any read that comes in is gonna read the old vote count and at this point any query for that view or for that query that comes in is gonna see 42 is about count and so the application all it has to do is do a single lookup into the cache it never has to do any invalidations the right just go to the database and the database does the right thing your cache is always up-to-date so that's the basic idea behind Nuria and it turns out this works really really well I'll show you some benchmarks later this is a great idea but it does have some challenges like this is research after all the first of these challenges is you can't just store all the results for all the queries you've ever executed that is not okay right you don't have enough memory for that at least not yet and so you need some mechanism to ensure that you can evict things from your cache but that is not straight forward into this design imagine that we threw of it away that all of the results for the story a and then a new vote comes in for the story a the count operator no longer knows what the previous count was so it can't emit what the new vote count is and so you need a mechanism your system to ensure that even if you throw something away or something gets asked for that you've never been asked for before that you have some way of computing it or recomputing it this is the problem of partial materialization that we will not get into in this talk but it is a really interesting problem the second one is that I've just shown you a single query and I've shown you a crate that does not change over time but in reality applications don't just issue one query they issue many queries and beyond that over time the application develops the application grows and evolves and it might change existing queries or add new ones and the database has to react to that the database has to know what to do and it is not trivial because existing data flow systems so you might have heard of things like Nyad or for that matter even spark does not really deal with things changing while they're running you sort of have to throw the thing away and start up a new instance this has the new set of queries we don't want to do that because that's a lot of extra work we all the system to just dynamically adapt and that is something that Nuria is able to do and you will have to read the paper to figure out why the third challenge I'm gonna tell you about and this is the one we're gonna dive into is you'll notice that in noria we have this results table that is where all of your rights go there's basically your cash but it's also where all the reads go now you have lots of readers and lots of writers all trying to hit the same thing right that is problematic you can't just do that unless you have some kind of synchronization because a writer might update a record while it's being written so we need some kind of synchronization here and we want that to be really fast because they're gonna be lots of reads and lots of writes to this at the same time so we're gonna dive into challenge three four one and two you should be the paper I can send out the the length of the paper afterwards the I promise you the solutions to the other problems are also very interesting but I do not have the time to cover them today but we will talk about that last challenge because it happens to introduce and require a pretty interesting data structure before we do that though we're gonna have to talk a little bit about some basic computer science in particular we're gonna talk about this notion of ownership so ownership is a key tenant of rust and the basic idea is similar to what REI is and C++ or in many other languages that if you own something then you are responsible for that thing so if you own some piece of memory you are the one who's responsible for freeing that memory whenever it's appropriate to do so in particular rust extends this width if you own something you also get to choose who has access to that resource and how in rust there are basically three types you could have for some type cheese so think of T here is any type in your system it does not really matter it could be a string it could be a number could be an array whatever for any key you can either have just the T which means that you own that thing only it means that you are responsible for freeing it you can have a mutable reference to tea that's how to read that second part immutable reference to tea is just like a pointer to tea in any other language but with the additional contract or the additional guarantee that no one else is able to read or write that tea as long as you have that mutable reference you have exclusive access to that tea but you do not have any responsibility to free it after you're just borrowing it the last type is an immutable reference to tea it is also just a pointer to tea but with the additional contract that you are not allowed to modify tea because there are going to be other people who are also reading tea these are the basic three types of tea and this is basically the entire ownership system in rust that these are the three types and rust will at compile time check that you have not violated these contracts it's gonna check at compile time that for every variable for every piece of memory for every resource in your program you never have both immutable reference and an immutable reference at the same time do you never have multiple mutable references either of those not okay your program will not compile it is fine though to have multiple immutable references right because you can have lots of things read things at the same time and you can think of this always almost as the compiler requires you to prove that your program does not do this and it turns out that's all you need to guarantee the program has no data races right if you never have two things if you never have something modify a thing while it's being read or modified you cannot have data races because you will either just have multiple readers or you'll have a single writer which is fine you also guaranteed that you only ever free things once because the owner is responsible for freeing it and you can only have one owner furthermore the the borrow check are also checked you don't use anything after you have gotten rid of it in particular this is how the borrowing system works it adds this notion of a lifetime so if you borrow any tea that borrow of tea is assigned a lifetime and you can think of the lifetime as how long you're allowed to access tea for so for example if he lives on the stack the lifetime that's given out for any borrow of tea it's gonna be tied to the stack frame of the thing that as T and the compiler will check that when T goes away when that stack frame is popped and T is freed there are no outstanding references to T so if you try to take say a reference to something sort of the stack like a buffer or something and give it to another thread the compiler would go no that thread can live longer than this stack frame therefore your program is not safe so it will not compile so why am I telling you all this this seems like a lot of sort of CS nonsense well it turns out you can frame the problem that we looked at in the beginning of this concurrent access to the results table as ownership right because what we're trying to do is allow concurrent reads and writes to a shared structure which russ has just told us is not okay you are not allowed to have something mutate in something read at the same time the waste way rust mist mitigates this is by forcing us to use synchronization so there's certain primitives in rust that allow you to take an immutable handle and get a mutable handle to a tea one of these you have probably heard of its called a mutex right in rust you will not be allowed to access a shared value without using something like a mutex does it specifically have to be a mutex it has to be something that safely provides a mutable access to something that is shared so let's look at what a mutex looks like in rust in in rust you have a mutex in this case that contains a tea again tea can be any type and notice that in rust the mutex wraps the tea so you have no way of getting at the tea without also taking the lock when you lock a mutex it takes a immutable reference to self here I've annotated annotated it with a lifetime of MTX for mutex right so the the thing you get back from the ejects can't outlive the mutex because the mutex would have gone away and what you get back is this single the mutex guard which you can think of as a sort of credential that you are currently holding the lock right so when you take the lock you take an immutable reference to the mutex and you get back a handle to the inside of the lock that inside of the lock has a get method that as long as you have Mew if as long as you have exclusive access to the credential you get exclusive to the tea right so what this entire chain is saying is that as long as the mutex is still alive and you have the lock you can get a mutable reference to whatever is inside the lock and what this is giving us is mutable access to a tea through a shared reference to the mutex and it does this by mutual exclusion right the internally the mutex knows that as long as it doesn't look keeping right this is safe to do because no one else is gonna have a mutable reference to that tea because they wouldn't have the lock they wouldn't have the credential therefore they can't access the team a micro well isn't this problem to solve like you just wrap our wrap our results table in a mutex and everything's good well no quite so mutexes are terrible because well they can be ok but in this particular instance they're really bad because they force you to have sequential access if you think of our results table you either have to do a read or a write for any given unit time which is terrible because we want to do things in parallel we want this to be current and fast and new Texas are neither concurrent nor fast so you might go hah but I know I know what you're gonna say use an RW mutex yeah so reader/writer lock is something that you may or may not be familiar with the basic idea behind behind a reader/writer lock is that you can have multiple readers take a lock at the same time or you can have a single writer take the mutex but you cannot have both this has a similar flavor to ownership right you can either have a single mutable point or exclusive pointer or you can have many immutable and that's great right it lets us do many reads in parallel awesome that's all we wanted and then every now and again you sort of have to take a break and do some writes is that good enough do we think that this solves our problem so in order to answer that question we need to look a little bit at how reader/writer locks work so this might be a little small terrine but the exact details are not important just trust me that this is actually how reader/writer locks work and pthreads the basic thing to observe is that there's a lock in here so whenever you take a reader the reader part of a lock so the thing that's supposed to let many threads take the reader lock at the same time whenever you do that you have to take this exclusive lock first you don't take him for very long right notice that you release it before you return but you have to take it whenever you take the read handle and whenever you give it back so what this means is that it is okay if you have a long critical section because taking that lock is very little of your workload right if you're taking the read lock and then doing lots of work and then releasing the read lock taking the read lock is probably not where you're spending your time and so the overlap is gonna be almost perfect you'll get lots of concurrency however if you have a short critical section then you're gonna take the reader lock you're gonna immediately release it because you're only did a few instructions in between if you're doing that taking the reader lock is going to be your bottleneck but taking the reader lock is a sequential operation and so now you're basically ending up with your program being sequential again all we're doing while holding the reader lock is doing a hashmap lookup how's my glue lookups are really fast therefore we're basically benchmarking the performance of taking a lock and I'm going to prove that to you by running a benchmark so this is a reader/writer lock on top of a hashmap and what we're doing along the x-axis is that we are increasing the number of threads that are doing reads of the map and I'm showing on the y-axis the number of reads per second and the number of writes per second and you can see that initially it's fine right like we get more threads and we get more read throughput that's great but over time we're getting dominated by taking this lock in the read so you see with few threads we do get the speed-up because there's some parts where the hashmap gets get to happen in parallel but when you get to more threads you're spending all your time contending on the lock that's not good the question is can we do better right we'd like this to be a straight line it would be the best thing we can possibly hope for you can't really have it be super linear that would be weird so now we need to do with that locks so remember how locks provide a safe wrapper around aliased or shared pointers right this is what we talked about earlier it gives you mutable access through a shared pointer but aliasing brings data races right this is the whole problem that we haven't see that you might just have two things the point of the same thing and try to mutate it so we're gonna have to need unsafe now in many of your heads so we're going to be like alarm bells going off now oh and say if I've heard about this in wrestling this is terrible this basically gives up all guarantees and Russ is now basically see please breathe it will be okay so it turns out what did I do no again there you go it turns out that writing unsafe code is really just writing C code whenever you type the unsafe key word Russ is saying you're now allowed to do the things you're allowed to do and see there's nothing more or less magical to it in fact all of the basically everything that unsafe lets you do is against you the ability to dereference a role pointer nothing more nothing less which from Sealand seems obvious like of course I can be referenced wrong pointers in rust you cannot it is not okay to dereference or pointer you can dereference references because they have a certain contract associated with them but a row pointer dereferencing it who knows how many other people have that same row pointer it is not okay but unsafe lets you do it in particular we can introduce this concept of a row pointer so a star mute of T is just a pointer to T with no additional contract it says nothing about who else might have that pointer all that's saying is that I know the memory location of a T it has no lifetime associated with it which inherently is unsafe right who knows this could point into the middle of the stack somewhere and that function might have returned who knows and then unsafe let's you take one of these and turn it into a mutable reference to T so basically you can give it out along with the contract saying I promised that this pointer is now exclusive to you no one else is using it and doing that is unsafe because you and are internally maintaining some invariant that makes that true you can also do things like changing types using this right you can this is basically pointer typecasting and see the idea of unsafe is that you are now responsible for guaranteeing that there are no data races because there's no associated contact with the Pointer anymore you are claiming by writing unsafe that you know that this is okay which is what you're doing all the time in C it's just in rust you have to be explicit about when you do that and what's nice about this is that now you know exactly which parts of your code to audit if you crash with the data race because it's all the places that have unsafe a place that does not have unsafe is forced to follow the contract so we're going to use them safe and we're going to come up with a following plan so this is the data structure that we're gonna end up implementing we're gonna keep two maps the we're gonna keep two pointers one pointer is going to point to one map the other pointer is going to point to the other map all of the readers are going to go through the top pointer and all of the writers in this case let's just imagine there's just one it's gonna go through the bottom pointer and we're gonna make sure they always point to different maps right so all the readers initially go to the Left map all the rights initially go to the right map whenever the writer decides that they want to reveal the rights that they've done to the readers this could be every right or they could decide to do it less often so to give eventual consistency but whenever they decide that the rights should be revealed they're gonna swap both pointers right so the rights are then gonna go to the left and the reads are now going to go to the right and notice that the reads now get to observe the new rights and then the writer is gonna reapply the rights that it has done to the right map to the left map right so that now the two maps are again up to date but there's a problem with this if you just did this naively there could still be readers left in the left map imagine that there's some reader who read the pointer and then it's gonna like start to look through the buckets of the hash map but then the writers swap the pointer so now the writer thinks that they're allowed to modify the map but the reader is still looking through the buckets of that map but it's not okay it's a data race and your program is gonna be wrong so we need some mechanism to deal with this right now the way to deal with this is not to add locks because we're not allowed to do that right locks would trivially solve this problem but locks are the problem in the first place we're not gonna have any locks we need to know when the writer has exclusive access to the left map when are there no more readers left the way we're going to do this is by introducing epic counters so every reader is going to keep just a local counter there knocking it's not going to be synchronized in any way this is going to keep a local counter to that reader and every time they do an operation on the map they're going to increment that counter so they're basically like a counter of how many reads I've done and then what we're gonna do is we're gonna have the writer look at all the epic counters it's gonna do the pointer swap look at all the epic counters until it is seen all of them tick up by one I want you to think about that for a second why is that okay why is that sufficient why is that invariant enough to guarantee that there are no readers well think of it this way whenever a reader takes up its count you know that for a to do another operation it must read the atomic pointer so if it does or if it's in the middle of some read then who knows whether it's read the pointer or not but if you see it increase you know that it is finished whatever operation it did in the past and it is now about to read the atomic pointer again so if you observe a tick in every reader after you swap the pointer you know that all of the readers must now have seen the new value therefore they must all now be in the right map and therefore the writer has exclusive access to the left side now this is not sufficient either now some of you may be able to realize why in particular what if a reader is inactive what if you have a reader that just never does any reads they're just gonna sit there like they created a handle to the map and maybe they did some reads in the past but now they've stopped doing reads the writer is going to just keep waiting keep waiting keep waiting and see the counters increase for all the readers but whichever ones are inactive this seems problematic what do we do about this so we're gonna pull another trick we're gonna say that you're gonna increment your epic counter twice you're already incremented once before you start reading and again after you've finished so what this means is that if you ever observe a counter to be even then you know that that reader is not reading the pointer you know that that reader is inactive so again we still have the guarantee that if you see it increased by one you're all good you can ignore that reader we have the additional constraint that if you ever see it being even you can also ignore that reader and the way to think about that is if that value is even it means that that reader is not currently looking at the map which means that if it were to look at the map it will see the new pointer therefore seeing it even is all we need so these two invariants combined gives us the guarantees that we wanted namely that the readers are taking no locks the writers are taking no locks all the read operations are entirely local to that thread and we still have safe access to both Maps right so writes can proceed at full steam reads can proceed at full steam so this is pretty good right remember that this was the graph for that we had for the reader/writer lock now we're gonna now do is look at the exact same experiment they're using this other map and we're gonna have the right swap the maps on every right operation so there's no amortize me or anything is just gonna swap on every single operation what we get is this this is on the same axis and you see that the rights are like a lot faster than the ones for you to read a lock they are still going down but that's sort of what we expect because there are more rights happening sorry there are more reads happening and so the writer gets to take their turn less often so that's as expected but it is doing a lot better but where are the reads well we have to zoom out a little bit so this is the reader/writer lock on a slightly different scale and you see that it's really not that good like reader/writer locks are terrible here's how we do can you see how much it matters to not use locks right in fact this is with 16 threads we get to do 25 million reads per second that's pretty good that's pretty good for a little data structure so this is actually a crate that you can use in your own projects this is a library that is put out there it's fully documented all of the internals are documented as well the benchmarks are open sourced there are seven lines of unsafe code in this entire library and the places where it's unsafe is basically to ensure that whenever we know that an invariant is true like for example we know that we have exclusive access to the left map as a writer then we have then we're willing to turn that raw pointer that we keep into a rust mutable reference because we know by by virtue of our invariants that this is now saved so that's pretty neat several lines of unsafe there are seven more lines of unsafe that avoids keeping both Maps you still keep two Maps but you deduplicate the data between the two maps but then you need another seven lines of unsafe but I feel like that might be okay so let's go back to the sequel I've given you sort of a micro benchmark here of how this performed with just that basic operation and compared to redirected locks but what happens at bigger scales so let's just look at this query for a second this is a relatively common query that you see a lot in web applications right you're just like counting the number of votes for a thing it happens in reddit and happens in hacker news it happens in lobsters it happens in all sorts of website to do this so what we did was we took this exact query and we ran it on my sequel we ran it on System z which is a database system I can't name because they have a benchmarking clause we ran it against pure memcache key so just memcache key obviously doesn't execute sequel but in memcache key we just do multi increments so we just use the inker operator in memcache key and multi gets so it's single round-trip for all the operations and of course that's sort of not really even a realistic benchmark cuz it's not persisting anything but let's just go with that and we also ran it on Nuria and Nuria uses this evie map this map that we just talked about the results are like this so the x-axis here is the number of requests per second we give the system and the y-axis is to the 95th percentile latency do we observe four requests notice that my sequel and system see fall over almost immediately I think the number there is somewhere around two hundred thousand operations per second memcache T gets to about seven million operations per second and we get to fourteen the difference between us and memcache T is surprising here right because we're actually doing full sequel execution but it's because our reads are just reading from cache and our cash just uses this data structure so there's no locking memcache she on the other hand does do locking on reads it locks the table set of uses and that's why we're about twice as fast as memcache T so it turns out that this kind of data structure optimization actually really helps so let's look at something that's not as micro as this this is still pretty micro single query so we did was we took the lobsters website which is sort of like hacker news the difference is that it is open source and we were able to get production traffic traces from it we took those and we built a workload generator that basically emulates user requests so it just loads different pages with the same kind of distribution as you see in the real work load and then we issue the same queries as loading that page would have issued what we see is this with my sequel you get around like 1,100 page requests per second you can serve with nor you get about 5,000 in the latest version of the code my sequel actually gets worse because it turned out that we were giving it an easier query than we thought so my sequel is really around like 800 or 900 this is not thousands right 800 900 page requests per second was on a 16 core server whereas Nuria gets to about 6,000 so this is a pretty significant speed-up that we see both from caching and from this additional data structure so I've taught you a lot about sort of data structures in performance now so let's go a little bit back to what this experience has been like this has all been to show you that this is a real system like this is not a joke it's a real system that is very fast and on real workloads so now let me try to tell you a little bit about what it's been like both to learn rust and use rust with this system so the first thing I want to tell you is rust is different you cannot go about learning rust the same way you learn all these other languages you've learned in the past it is true that its syntax is sort of relatively C or ml like like it's not that bad just to look at it but that's a little bit deceptive it is not just another C or C++ and go if you just try to get the gist of it and just like hack your own code you will hit a wall and that wall is the borrow checker and you might have heard this phrase of fighting the borrow checker that is what you're gonna end up doing you're gonna be spending a lot of time sitting there or the compiler yells that you're going you're not allowed to do that and you go why why and then you're just gonna be screaming into the void because you don't understand the model that's underlying it I've tried to give you some inkling of what it's like and it's not actually all that complicated but you do need to learn the model so the learning curve is basically learning borrow checking now there's a lot of depths to that but even just a cursory understanding is gonna get you pretty far I really recommend that you read the rust book so the Ross programming language book is an online book written by the rust team it is very good and you should read it start to finish this is not something we normally do when learning a language but with rust that is actually worthwhile doing it will save you so much headache and so much time there's also something called rust by example for those of you who sort of prefer learning by reading code rust by example goes through many of the same concepts as the rust book does but it also by giving you code examples every step of the way and you can compile and run the code samples too to try to figure out and sort of understand how tweaking the code helps these two tiny URLs at the bottom are links to resources for busy outlines for how you should go about learning rust they may or may not be useful to you but I recommend you check them out if you're interested the other thing is you really want to work with the language and this is not just apply to rust but I think it applies especially to rust if something feels hard if you get stuck on something it could just be because your program is wrong that is true but in many cases is because the language is sort of trying to tell you something like if you get stuck on Baro checking is probably because the approach you're taking might not be the right one so one example of this is if you end up with cyclic data structures so you want to represent say a graph for example and you try to represent it just by having pointers everywhere like you wouldn't see you couldn't totally do that you might have to either do one say for reference count me or something the alternative is store all the nodes in a list and then store the indices in the graph and now you have no cycles right so you're sort of fighting the borrowed tracker because you're trying to represent something in a in a weird way like you could use reference counting but you should you should really sort of second-guess your own design if the compiler is really fighting you on something in some cases the compilers just being stupid so there's a RC channel there also discord channels and guitar channels and all the other things but I'm pretty good the IRC channel this is called rust on Mouse net is really good there's also a rust beginners channel there if you get stuck like ask a question and someone probably from the rust core team will come back to you in like five minutes it's fantastic there's also a secondary argument to this which is if the compiler if borrow checker understands your code the it's a higher chance that someone else would understand it too if you start adding lots of unsafe annotations it's just like do lots of pointer type aliasing you can do that but chances are like six months later or three months later when you come back and look at this code you will not understand why you did what you did or what invariants are being enforced by the unsafe and no one else is gonna understand it either so if you can avoid unsafe and not have to fight the bar checker that's probably better I also recommend that you learn the isms of the language rust has some some things that other languages current languages don't really have so iterators is a good example of this so sort of ends up being a slightly functional pattern so those of you experience with Haskell might be more familiar them iterators are basically you can walk any collection any you think of it as a generator and Python and you can do really neat things just based on iterators that make your code a lot easier to read and write pattern matching is fantastic error propagation and rust and null pointer propagation and rust is really easy using the question mark operator which will become intimately familiar with if you start writing Russ code and they're just gonna save you a lot of time to learn them I wrote a blog post a while ago when I discovered many of these myself that you may find interesting if you start diving into rust there's also a really good tool built by the community called Clippy so Clippy is a tool that will take your rust code and tell you all the things you probably shouldn't be doing it's like basically someone wrote a thing that's gonna do peer review for you and it's gonna yell at you for like don't do that don't do that don't do that it's great it's like go vet for those who used to go i also recommend that you build crates so crate and russell is basically a library but they're really really easy to make new ones the overhead to making any one is really low the overhead for adding a dependency of one is really low and the reason is because it basically forces you to do separation of concern it forces you to write modular code it also gives you faster builds because rust compiles multiple crates in parallel and so it speeds up your build process and finally it gives you much better documentation and incentivizes you to write documentation for each module separately and rust has amongst other things are linked that you can turn on in the compiler itself in the language that says deny missing docs you set that up to top your library and now the come your program will not compile unless everything is documented it's fantastic the other thing that's really neat and that that I've observed over time in rust is that refactoring is really straightforward and this is in part because of the type system right you can decide to move a bunch of stuff around and then you've let the compiler guide you to all the changes you need to make and when your program finally compiles chances are it's basically right right because your types work out and that means that the code is all now in the right place you can do compiler guided refactoring and you can basically trust the compiler this is for those of you program in Haskell in the past is a similar kind of feeling of when it compiles it's probably pretty close to correct all right there's still gonna be bugs like trust me it'll still be bugs I also recommend you take advantage of the ecosystem because the dependency management is so good it is really easy to add dependencies on projects that other people are written there are a lot of really good libraries out there that you can just add a dependency on and it works the code formatting tools are good enough in rust that you can look at other people's crates and basically understand what they're like the formatting is consistent the documentation is consistent and because it's auto-generated for all published libraries and interlinked automatically it is really easy to just look up a crate you're gonna see its documentation you see how it fits together with yours and if you upload your library will be automatically interlinked with that crate that you depend on as well the ecosystem is still a little bit young so there might be some libraries you don't find but then maybe you can write that library I'm gonna leave you with this quote that I think is really good it's a little bit long I'm gonna read it out loud but you can also read it internally one of the value propositions most frequently lauded by Russ developers is its freedom from data races the compiler will literally not allow you to build code that could ever produce a situation where two threads can mutate the same data this teaches us two things loathing of the rust compiler and it's borrow checker until we make things work and to the extent to which we've written unsafe race run code in other languages I've learned a great deal from that second bullet point I look back at some of the code I've written in the past and I realized the only reason it worked is because some other library had guardrails in place or the lack of data resis was an accident in other words my like a failure is more attributable to random chance than skill I really feel like this when writing rusts code now that it has made me a better programmer I feel like I now see data races in my code and I realize that the compiler is about to yell at me if I go back now and write Python code or C code I noticed that there are data races were because I think all the Ruskin pod would never accept this program so therefore it must be wrong if I look at old code that I have written in other languages I realize that that code is probably broken I wish I had the russ compiler to tell me what was wrong with it right I I've tried to cover both sort of low-level details about Nuria and high-level details about Russ in general if you have any questions at all about any of the stuff I talked about today then feel free to follow up me either by email that's my email or I'm on github and Twitter and all the other things under the username on the right feel free to reach out to me or through Larry if you want and I'm happy to take any questions you might have thank you [Applause]

Info

Channel: Jon Gjengset

Views: 136,285

Rating: undefined out of 5

Keywords: rust, noria, research, lecture, concurrency, database

Id: s19G6n0UjsM

Channel Id: undefined

Length: 52min 54sec (3174 seconds)

Published: Sat Jan 05 2019