RustConf 2016 - A Modern Editor Built in Rust by Raph Levien

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
um actually the the first thing is how do you pronounce sigh that's kind of what I've settled on everybody has a different idea but that's that's excellent so yes I'm real Avena I'm at Google um this is actually my 20% project this is not an official Google product and to the extent that I express opinions it those are my own and not of Google so so let me show you a little bit like you know just jump in and do a little bit of a demonstration so one of the things that I do a lot is that I open a lot of extremely large files so this file is 380 megabytes and it loaded pretty fast I actually want to get that even faster by loading asynchronously but that's that's not the way it works right now it's a synchronous load so to talk about maybe a second maybe a little bit more and then as I scroll through that you know I'm getting like complete 60 frames a second just butter smooth scrolling that's the goal of this editor is extreme performance so let's talk a little bit about you know what it what is the shape of this project overall its its host on github it's under an Apache 2 license it's completely you know it's not just open-source but like really kind of community based open-source project I started it a little bit more than six months ago and took it public you know it's kind of doing it just on my own for a couple months took it public in late April the code base is in a couple of different pieces there's almost 10,000 lines of rust code which is the core and then a bunch of libraries and the libraries potentially have interest you know outside this project and especially there's this rope library which is the string representation I'll talk about a lot more detail later and that's almost half of the it's all in stable rust completely safe zero uses of and safe in the code and I thought the earlier talk was really interesting about the use of traits the ratio traits is actually pretty high compared with some of the other I do use traits extensively and I'll talk about that in a little bit more detail there's also twelve hundred lines of Swift code in this project which is the cocoa front end it's a Mac it's a mac app although the intent is I want to see more front ends I want to see lots of different platforms that this runs on so it's my 20% project open to the community there have been 27 contributors so far and I'm hoping that there will be a little step function hopefully as a result of this talk so the goal of this editor is I kind of open up this performance and how am i achieving performance there's a bunch of different ways you do that so that the most obvious is use a fast language hmm rust and then you know I wanted to do it in a more modern you know I really want to use kind of the best known techniques of doing all of these text editing primitives and you know if you look back 20 30 years ago when people were kind of building these tools multi-core parallelism doesn't really exist you don't really try and use that to solve problems so I haven't done a lot yet but I want the architecture to support do as much work in parallel as possible where where I think things get even more interesting is to use the most advanced data structures and the most advanced algorithms for manipulating text so they the data structure is ropes and as far as algorithms the most important thing to do is do as little work as possible so when you do an edit you do an incremental computation of only updating just a few tiny things around that it get that on the screen and don't like recompute things but I think one of the things that makes AI most unlike previous editor projects is using asynchrony as a core defining principle there are projects certainly it's inspired by any ovum which which tries to do that but I think I've taken it a little bit farther so the goal is to never block Onslow operations so this is a little bit of a picture of the way that the editor is designed it's in separate it's in lots of different modules so it's really like a micro-services architecture and so this core at the center of it is a small server and then there's a front end which is bound to your GUI platform but the core it doesn't it's just sitting there you know responding to requests and then the core owns the state of truth of the document so if you open a very large file that's in memory in the core and the front-end is only looking at a tiny window only what's on the screen so the core does the front-end doesn't have to deal with like scaling to very large documents and then another part of this architecture is that a lot of the things that that you want to do the plugins those are in separate processes and again kind of wired together with this micro-services tile style architecture so I'm going to walk through what happens when you press a what how does that actually end up in a in an edit on the screen and kind of how does this flow through this architecture so you start with a cocoa event insert text a and that happens obviously on the front end and that turns into a JSON so these this is just pipes this is just the the core is just listening on standard in and you see this JSON RPC message I'm showing kind of a simplified version of the JSON RPC but it's it's still pretty simple so it's an edit request and insert and the string that you want to insert is a so that immediately is reflected in the document as a change to the document which is a change in this case just to the string of the document and the next thing that happens is that the core now dispatches and this happens kind of this part in parallel it's sending out a of different messages to different other things that are subscribing to those changes to that document so it sends first of all an update to the view and said display sends to the front end display these lines which are representing the view of what's on the screen it's also sending to the plugin the syntax plug-in very fine-grained information it says this this edit change this edit happened to the document and you know where and what changed and why because sometimes the plug-in will care about that so the view is now updated and then the syntax plug-in is a little bit more time to think about what color should that really be and it sends a JSON message back to the core that has these syntax highlighting expands so the core then basically updates its state which contains both the string and these style spans and also dispatches to the front end another JSON RPC that says okay the screen what's the view that's on the screen has changed a little bit and then the front end displays that so that's kind of step by step what happens in this in this architecture so a little bit a little bit about the front end I wanted to kind of explore the most modern ways of writing you know all these kind of apps and so the friend end is written in Swift I wanted even though a goal is for this to work on lots of different platforms I did not want to use a cross-platform GUI library because I feel there's always some way in which it does not really look like a modern app you know like a native app and it doesn't really feel like a native app so it's like for each of these platforms I'm writing a fully native front end and the the front end contains all of the logic that is specific to that particular platform to that particular GUI and another you know way that I'm achieving performance is that it's really only holding a tiny amount of state so a lot of those things that happen things bogged down when you have a tremendous amount of state in memory don't happen here and the the architecture is designed so that you I avenge on anything it's always available to take a keyboard or mouse event so when you do this asynchronous stuff you know just saying Oh things can be happening here and they can be happening there at the same time causes problems that if you have you know an edit that is being proposed by your plugin either to do syntax coloring or things like inserting indentation if you're typing at the same time you get two edits that really conflict with each other or potentially conflict with each other and there's a lot of literature on how do you resolve that how do you resolve really concurrent edits that are happening from different sources and usually people think of operational transformation in the context of a networked collaborative editor and you know that's that's a direction this could go because the engine for computing the operational transformation is pretty powerful it's pretty general but I'm really focusing on just solve the problems of concurrent edits that are happening because of a keyboard a plug-in you know all on the same machine so so you definitely you can get these you can get these cases where things are happening concurrently what you do you take that edit and you transform it you say in order for that edit to make sense in the new state of the document I have to transform it slightly you know they inserted a character here so I'm just going to take this and adjust it so that it now fits in the new state of the document so I'm going to write more about this you'll hear more about it I'd love to go into more detail here but lots of stuff to cover it's kind of a hybrid of a traditional operational transformation and these conflict conflict-free replicated data types which is kind of a newer model and as I say I think I think there's potential for this to go a little bit deeper into collaborative editing and another function of operational transform is to handle this kind of when you do undo that's also kind of a nonlinear time travel and you can get into the same kind of conflict problems you can get into the same things where when you do an undo are you really restoring to a consistent state to what you had before so another piece of the implementation is this rope data structure so what is a rope a rope is basically a balanced tree where each of the leaves is holding some piece of the string so the size of the leaves is bounded and the branching is bounded which means that pretty much any editing operation that you want to do is going to be log in in the worst case in the size of the buffer if you look at something like a gapped buffer if you're editing with a lot of locality you'll have you know a 1 but if you do something that kind of you know has to move that gap that can be a n so you have a thing where your average case is maybe pretty good but your worst case can get really bad one of the advantages of rope is your worst case is always is always log n so in this case you can see the you know like in the nodes I'm storing the size I'm storing the number of bytes of string of the children so the implementation of this in rest I think here's an area where the goals of Zhai and the capabilities of rest as a language really mesh beautifully that the tree there's a generic tree implementation and that's parameterised through trait so there's a leaf trait and a node info trait and you can plug those in in right now I have three completely different specializations for different ways in which I'm representing sequences and I'm representing computations that I want to do on sequences and I think this is going to expand I think I'm going to use it to store like incremental syntax highlighting state so I'm storing the stray I'm storing the line breaks and I'm storing the rich text annotations in three different specializations of this tree so the theory behind it is you know it you really can represent any mono humble morphism I'll talk about that a little bit more in case people don't know what that is another way in which like the capabilities of rust fit the needs really closely is that the the API gives you this immutable data structure so ropes really became popular in purely functional programming languages because if you have like a string buffer and you just want to append to the end of that string buffer in a purely functional programming language you can't do that and if you had to copy the whole string your performance would be just terrible so that really push people towards this rope data structure so you have this tree and yes you have to build up a new tree but you only have to allocate a log n nodes um and that's good but in rust you can do even better if you just want to mutate one of these ropes and you happen to be the only one holding a reference to that then you can there's this get mute method of this arc reference counted container that lets you just get that reference and do that mutation and the type system guarantees that everybody else is going to see an immutable copy so if if somebody else was holding a reference to that rope it would make the copy and you would not be changing the state out from somebody so if you were to implement this in a language like C++ you'd get the performance but we're doing a lot of kind of aggressive things here we're doing a lot of things that in you know if I were reviewing C++ code I'd see this is too dangerous this is too risky too much can go wrong and then in rust you get this guarantee from the type system that is immutable but you're not being held to that allocate everything so you get the performance so what is a mono a homomorphism a mono eight is a binary operator really that has the Associated property and the identity problem that's pretty abstract so so what are some good examples of monoids a string is a classic monoid and so the binary operation is just string concatenation and your identity is the empty string another example is integers and then your your binary operation is addition and your identity operator your identity element is 0 so it's pretty obvious that these things you know both hold the both respect the mono add properties so then what is the homomorphism well a homomorphism is a function from one monoid to another so a really good example is let's go from strings to integers where that function is the length of the string so they're both glenoid and the this function this string length function is preserving the the the binary operator in a way that like if you if you do a computation on one side that's accurately reflected on the other side so when you go back to the to this picture of this tree then the leaves are storing the m monoid and the nodes are storing the end so you know you're not replicating those leaves but at every point in the tree you know what that length is and this is generalized to you know anything that you want to compute that fits within the monoid homomorphism framework and there's a few things like right now the main focus is on counting new lines and so that gives you you know if you're storing both the string length and the new line count then you can do a traversal of that tree that's still you know log n that will give you a correspondents like where am i what is the offset within this file for a given line and vice versa and there's there's actually a lot of other interesting things you can do in this framework and won't go into lots of detail now but I think this is going to power you know some of the kind of impressive performance improvements that I that I hope to do so another algorithm that is really important to get right really important to get fast in an editor is a word wrap and one of the things that I have done in Zhai is very aggressive making this incremental so we'll go back to the we'll go back to the to the file here and I'll do word wrap first of all as a bulk operation and that'll take a little bit of time but not too bad and now that I'm here you know as I type this line is actually more than a megabyte long and it is not like in almost any editor you just rewrap the whole line I mean how else would you do that but there's actually a more sophisticated you can actually see down here it says that I've touched 140 bytes when I did that and it took 0.01 milliseconds which is pretty fast so how do I do that I I start the incremental word wrap process at the line before the given edit because you know you can like you it's possible for it to affect the line before the one where the cursor but it can't affect anything earlier than that and it keeps going until you're able to sort of resynchronize with the previous state that you had so it actually you know a lot of times it converges pretty quickly so the result of line of wrapping word wrapping is stored in a line break data structure which is just another specialization of the rope of the underlying rope so this is just storing line breaks and the as I said that initial word wrap it's currently synchronous you have to do it on the hole buffer before it responds again but I'm you know I want to make that asynchronous about really soon in the design I think supports that it's just a little bit more tricky coding so many editors do plugins by kind of exposing bindings to a scripting language in the same process as the editor so you basically get these data structures that get exposed of buffer and selection and cursor and so and so forth and ins I I've decided to do it in a pretty radically different way and so this is really microservices and it's really the UNIX philosophy it's real you know taking modules and wiring them together so that each module gets to focus on doing one thing for example syntax highlighting um so these these plugins communicate over a pipe with JSON RPC and there's a buffer protocol where you know the the plug-in is maintaining a window into the into the file so if the file is really small it's just storing a copy but if the file is really big in my store like a one megabyte window and then if you have to do a lot of processing on that then you'll do our pcs back and forth to get access to that so this is working today I can actually demo this again I'll open it open one of the files oops didn't mean to do that this one and then again all I'll make that a little bit bigger why is this not oh there it is make it a little bit bigger so you can see it and then I'll do the syntax highlighting and then you know as I type you don't see see mmm hi said I wouldn't do live coding and here we go I should probably give that a name with it if we had a syntax analysis plug-in and I would have given me an error message therapist not yet so this is working and you know it's it's incremental and you know it's it's using this operational transformation so if the syntax highlighting we're really slow and I continue typing then it would all be valid and then it would eventually catch up like the the model of CR DT is really eventual consistency so the idea is you know when you stop typing eventually the thing will converge to the to the true answer so this is inspired this idea that you can have things talking to each other is inspired there's a few efforts out there including the Microsoft language server the one that's used in typescript and I'm I'm actually hoping to support that protocol directly as well as the you know kind of more specialized protocol that I've built just for communication with things like syntax highlighting and indentation and so on so the syntax highlighting module that you just saw there this is based on Tristan Humes syntax library so it's you know really a general-purpose library and you know I've kind of just layered on the bindings to make it talk over this JSON RPC protocol and it's using its reg X based approach it's compatible with sublime text it just can consume the sublime syntax rules the way it's implemented internally is its using the onik bindings to the owner guruma library which is the reg X library from Ruby I have implemented a rust based wrapper around the burnt sushi reg X crate because you do need to support things like back references and contacts and stuff like that it's not quite ready yet but it's an interesting possibility to take this whole thing to rust only so there's no C bindings in there at all and hopefully like better performance because the burnt sushi rig acts great uses really kind of intelligent finite state machine techniques to get even faster records handling so it's pretty fast it's not the fastest syntax highlighting in the world but it is a lot faster than the ones that are just written in JavaScript that you see people use so one of the motivations for making plugins is you know I was looking at this is like should plug it should syntax highlighting work is just be something that's in the core natively supported or should it be out there in these plugins and one reason I wanted in plugins is I don't think that's that reg X based highlighting is the is the future I don't think that's what I want to do in an ideal world because if I just Lex the language and have a kind of a loose grammar for it that understood like when am I in a type when am I not in a tub when am I just didn't in the expression then you'd be able to do things like look at an angle bracket and say well is this a bracket or is this a comparison operation and so I think that has a potential to be both faster and more accurate just strictly better across the board and I'm kind of excited about the potential of having a syntax highlighting module being used in context other than just driving syntax highlighting of zayed etre so let me show you kind of you know how would that work like what would you do so let's build the highlighting plugin take a little while to compile no problem and then I'll go and I'll just run it it it's now it's listening on standard in so then I will open up my our pcs and I'll take a little bit of JSON and what this JSON says is I'm going to initialize a new buffer and size of that buffer is going to be 16 bytes and so it says an RPC that says oh give me the data for that buffer and with a maximum size of one megabyte so if that was a huge file it would be saying you know give me the first megabyte of that file so okay fine sure pump in the main good and it says here your color spans and so then you know if I want to say Oh edit that you know make that you know hat is starting to comment you know after that then I just send another RPC that says edit insert here's where you're editing here's the text that you're adding and so sure here's your new color spans so it feels to me like a lot of things could potentially be talking to this plug-in and you know you could be a really kind of general service and as we get like higher quality and higher performance I think it could be a very valuable library and like people that need syntax highlighting as a service you know makes sense you heard your first and you know given like we're rust is going like you could even imagine compiling this to as I'm using in script and running the whole thing in a deploying the whole thing to the web so the RPC how does this work so it's based on JSON RPC and this is one of the most conscientious on people say why JSON that's so inefficient so slow it's actually not inefficient JSON implementations tend to get optimized like crazy so if you were just defining a binary protocol and you just started writing code that would probably be slower than the actual cert a JSON which has been through a tremendous amount of evolution and optimization and anytime you're writing a plugin and you're like oh I have to do this RPC layer well doing it in JSON that's very easy I mean you talk about batteries included this is a double-a battery not a cr123a right so the the current code is is using threads and like so there's this thing where you're blocking on input and you actually want to be a little bit more sophisticated especially in the syntax highlighting plug-in that you know you're thinking and if somebody presses a key you get in an edit event that comes in you actually want to interrupt you're doing this in chunks you don't want to do it a line at a time but if you see an event coming in you want to be sensitive to that and so there's like a method that says you know is there request pending and then you pull that like really you know every time you actually do a line and the way that's implemented is that there's a separate thread that's blocking on input and it sends a message over a channel to the thread that's actually computing your syntax highlighting and then it's using arc mutex all over the state every you know that anytime you need access to some piece of data then you you do you go through an arc mutex to get that and that's not the future this is the future that I really would love to replace this with using the futures library which we just saw and that would have some pretty significant advantages that this that moving just moving an object from one thread to another thread and crossing the Linux Cisco's just call barrier to do that is between five and ten microseconds and so by doing it with futures you actually are going to save that overhead altogether and another thing I'd like to do is you know this idea of the future this idea of saying okay here's you know a request to do something and I'm not ready with the answer yet because maybe it's an RPC over here or maybe it's a slow computation the right model is here's a request the result is a future with the result and that that I'm I think can be kind of a metaphor right now this is kind of coded by hand but I think that can be an organizing metaphor for this and I think I want to refactor it in such a way that the psycorps can be more embeddable in other apps and not necessarily even dependent on json-rpc that just becomes a detail so switching gears a little bit another component of Zhai is that Unicode library right now the main thing that's in there is this line breaking algorithm so uax 14 has all the rules about like if you have a combining character then that's not a line break but if you have a space it is and you know it's actually emoji has a whole set of rules it's very complicated and you know the industry standard implementation of this thing is ICU you see that almost everywhere and of course I wanted to do it my own way and so I built a kind of it's at the heart of it it's a state machine that it you know you character it just runs through the string character comes in what Unicode class is it advanced the state machine by one state and then it says either this is or is not a line break opportunity and though the implementation turns out to be three times faster than ICU because I've just you know focus relentlessly on the core what is this thing really need to do and the API is designed to support that incremental so it's the incremental breaking so you know obviously you get an iterator it's very natural very clean interface so you only run it as far as you need to and then you initialize it with some state which might be in the middle of a line and just run it only as much as you need and so I'm hoping to do more with Unicode there's a bunch of interesting problems that need to get solved I think the most interesting of these is the case insensitive find and this is actually kind of where it touches in to the homomorphism stuff that if you're doing case insensitivity for completely arbitrary languages that's a really tough algorithm you know because there's these case transforms and there's you know different normalization forms 90% of the time especially when you're dealing with a really big like dump of something like you know that ninja file that I opened before it's asking so I'm hoping to use a homomorphism to say like a sort of a difficulty level and say ASCII is the simplest difficulty level and then there's more that might tell you oh you have to use these more complex more expensive algorithms to do case transformation so when you're doing search you know you're going over the trees say for this whole subtree I can just use this really really fast you know just run over bytes and you know and with 20 with not 20 I guess to do the case transformation and then if if it's you know if it's a complex language that needs more complicated case transform rules and you say okay go go the slow path and I want to make sure all this stuff that's inside Unicode like some of this kind of makes more sense to be I mean there's some Unicode functions in the rest standard libraries although I think the kind of the tradition now is you put things like this in a crate rather than the rest lang you know core crates and Unicode RS so you know I'm talking to the people there and I want to make sure that like the stuff that makes sense to go in there the improvements go in there and and that really you know brings us to this whole question of community involvement this has actually been one of the more gratifying things of working on this project that within a week when I put this on github I got to prototype front ends that worked a I think that was the GL and the windows-based ones and those are kind of coming along and then the syntax highlighting Pro I was getting going pretty far down the line of writing syntax highlighting my self-interest in Hulme kind of came up with this intact library and it's like you know hey it's done that's awesome and so I have 27 contributors total you know with lots of different features and fixes and improvements and lots of really great discussions that go on on the github issue tracker as well and the process of doing this you know like certain JSON didn't escape control codes correctly so that was really cool interaction you know like here's a fix and a little discussion is this the best way is just going to regress performance and then that got merged you know some of the like Unicode property lookup you know I use tries that's faster than binary search so pull request to rustling to get the Unicode property lookup faster and then the line breaker like it was really good timing because you know servo had this thing that was just get spaces it didn't really use do the Unicode rules right and I was like hey do you need this and they're like yeah this would be great and so again getting that integrated into servo happen within I think like a week or two of the project even going up to github at all so you know I really enjoyed you know being part of the Russ community so I hope we have some time I think we do maybe a little bit for questions let's see you take one question we're actually a little bit over time already but I'll give you one question so you get to pick who that is oh my gosh I think your hand went up first all right all right now I'm nervous cuz they might have better questions um I'm actually really curious about the you're scrolling of the 300 Meg file what it would it you just just described the protocol to actually like yeah scroll and refresh is that it like a deltas or your shipping Delta's back and forth or what um so that's actually interesting I was debating with it whether to have a slide for that so as you scroll so the thing is that the core is maintaining state of where that scroll region is so when you're not scrolling if you update it it sends like that screen full of information so it's at most one screen full of information as you scroll obviously you've got state that lives in the core that doesn't live in the front end so it's ends at that point that's like one of the few synchronous are pcs in the system and it says you know give me the give me the view the rendered view so it's got all the line breaks and syntax and the style spans in there give me the rendered view for this region and then that's an RPC that come goes out and it comes back as JSON lines with spans and gets displayed and then I've I won't show it there's like debug information that's almost always less than one millisecond round-trip to make that RPC happen so even though it is synchronous you know it's not slowing down the the scrolling process you're sending the screen full of information which is not that much oh no there's a window I mean it's a it's there's like a yeah no it's it's more it's more sophisticated it's like a delta of what you actually need yeah and it's chunked so it's not you don't do it every line it's like a tree screenful awesome thank you so much the cycles are the great
Info
Channel: Confreaks
Views: 40,222
Rating: undefined out of 5
Keywords:
Id: SKtQgFBRUvQ
Channel Id: undefined
Length: 36min 16sec (2176 seconds)
Published: Tue Oct 04 2016
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.