Rust at speed — building a fast concurrent database
Video Statistics and Information
Channel: Jon Gjengset
Views: 136,285
Rating: undefined out of 5
Keywords: rust, noria, research, lecture, concurrency, database
Id: s19G6n0UjsM
Channel Id: undefined
Length: 52min 54sec (3174 seconds)
Published: Sat Jan 05 2019
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.
Hey Jon!
I spoke about Rust and Postgres at this exact venue just a few months ago, organized through the RustNYC community though. It bums me out that I wasn't aware of your talk about Noria because I would have definitely gone (not sure whether this was a public event?).
You're generally not in the NYC area, right?
Is anyone working on a postgres adapter for Noria yet?
I really like the idea of maintaining the materialized views live.
I still remember a very simple task that I just couldn't manage to get to work with acceptable performance on Oracle (v11?): a simple, frequently updated, table
folders
contains essentially 4 columns: folderId, actedOn, messageId, publicationTimestamp.How fast is
select count(*) from folders where folderId = ? and actedOn = 'N';
, with an index onfolderId
/actedOn
? It's O(N) in the number of rows matching.Maintaining a
folderId
<->count
was impractical: it caused high contention on the most popular folders. I never got to attempt the idea of afolderId
<->count
with 10 or 20 rows per folder, using randomized distribution of the increment/decrement to reduce contention by taking advantage of row-level locking; in the end we just axed the idea of an accurate count for folders with more than 1,000 messages and would display 1,000+ instead.So... how do you think Noria would manage on this benchmark:
folders
table with folderId X and actedOn 'N'.N
toY
(never in the other direction).select count(*) from folders where folderId = ? and actedOn = 'N';
to be for folderId X?(Assuming MVCC, if the count returns 3 and we select the messages, we want 3 of them, not 2, not 4)
Great Talk. Just wanted to say that you have found a great way to structure it. Its not easy to "squish" that much information in such a short amount of time. It was really helpful and i am eager to read the paper now. Turns out the
evmap
was also very interesting to me because i don't think there is much choice of concurrent hashmaps in rust currently – at least i think that is the case. May i take the opportunity to ask if there is a way inevmap
to have aninsert
followed by arefresh
in such i way i can't forget to call both like in ainsert_refresh
method? I see many benchmarks use this pattern and i think you talked about it in your talk as you compared noria to other databases that this was the pattern you used for comparison. I think it would just be more convenient. I couldn't find it in docs. Another thing that could be quite useful with the way you structuredevmap
is something like transactions for maps. Having something like adiscard
method that would discard every inserts(and updates etc.) to the point of last refresh (so basically a copy of the current read map) sounds like a useful thing to have given the wayevmap
works. Anyway Thanks for this talk, your streams (especially the async/tokio one) and the crates and overall work you've done!Good video, looking more into this DB 🤔
Oh hey, it's you! Really pleasant talk, thanks! I wrote a small program using
fantoccini
just yesterday, so thanks a lot for that library too. The docs helped a ton.Holy shit. That double map / deferred write is pretty cool and easy to understand.
I'm not really a computer scientist or programmer but i'd like to ask if this idea is the same as read-copy-update exclusion i've heard about in the linux kernel?
Hi, I really like your work Jon and I believe I saw it in the past on reddit.
Firstly, the adaptive element of this is amazing and the cache principle is very good compared to memcache/cache-stampede/cache-invalidation of the past.
However, as a DBA/data-performance-engineer, you cannot really compare to an actual materialised view that someone took the time to data model well.
If I setup a manual materialised view on MySQL (with a refresh per hour/5min or trigger based), it might actually beat those 14million reads. Maybe if I throw in ProxySQL in there with a TTL cache, it could.
Not only that, but you're half way to setting up a data mart/data warehouse as well.
I am regretful that MySQL and to a large degree Postgres, does not really have materialised views like Oracle and MS SQL. I believe that had it done so, we would have not seen 2/3 of all the middleware/caching we are dependant on today in the open-source world.
Hopefully, one day someone comes up with a Rust-based database that has fast materialised views. And they will probably use your evmaps.
That was an amazing talk! And as always with your Rust content, I found myself learning stuff once again -- you're great at explaining complicated things.
Oh, and for additional context, this is the original Reddit post for evmap, and this is the one for Noria.