Building a collaborative text editor with WebRTC and CRDTs

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

[Applause] hello my name is Nitin and today I'm gonna walk you through my journey building a real-time collaborative text editor from scratch and JavaScript while there is some JavaScript in my presentation I'm primarily going to focus on the engineering challenges we faced and some of the decisions we had to make along the way so I didn't build conclave on my own I had two great engineers with me Elise out in the Bay Area sun-lee in New York City and myself in Minneapolis we're obviously a remote team and as we were working on another project we were trying to find it we found it very difficult to program together to kind of pair program or program the three of us together using just screen share and video chat so we started looking into existing collaborative editing tools and while doing that we decided to explore what it would take to build one on our own so towards the end of my talk I will show a demo of conclave but for now this is kind of it looks like it's a web application it's hosted on Heroku you open the site and you're given kind of a blank document if you're the first person to open the site you're given a sharing link which you can share with your collaborators and when they click that link to automatically connect it to your document you can upload an existing document you can save your current document it kind of looks like a basic version of Google Docs at this point so before I get into what a collaborative text editor is I'm going to talk briefly about what a text editor is so it's obviously just an ordered list of characters each character is going to have a character value like the letter or tab or a space and numerical index that represents its position within that ordered list I mean there's two basic operations you can do with text you can insert characters and delete characters so let's say we have the word cat you have three letters they're three values and three indices 0 1 & 2 let's say you wanted to insert and o to make a coat what you would do here is have an insert operation with two parameters the new character value and the position delete delete' is even simpler just the positional value I'm not remove the character the important thing to note here is that when you're inserting a character or deleting the character you don't just affect the character you're trying to insert or delete potentially you have to affect every other characters position within that document so what's a collaborative text editor the difference is we want multiple people to edit the same document at the same time so we'll have to provide a copy of the document for every user so they can have that open up in their browser and when we say real-time we want it to be real-time so we want them to be able to make edits on their document as if they're using the basic text editor we don't want to have to wait to see if that change is allowed or we don't want any kind of waiting we want that edit to be applied immediately an example of a non real-time editor would be something like git where you can't have multiple users editing that same document but there is that extra merge that has to happen at a later time so to allow this to happen I'm gonna set up a basic architecture for now so our initial architecture where we have a central server in between all of our users it's pretty much responsible for relaying operations so as a user makes a change to their document it's got to get relate to all other users collaborating on that document and it's gonna go through the central server to do that so with that that's our that's our you know basic collaborative text editor let's see how it works so we're gonna do the same operations as before except with two users this time we're gonna have user 1 insert and user to delete so user 1 inserts the yellow position 1 sends that to the central server user - deletes character position 1 and sends that to the server as you can see we have a problem one user ends up with cat one user ends up with cought the reason might be fairly obvious here it's because the orders were operations were applied in different orders and so you got different results this is kind of the main problem that you run into how do you allow users to make changes to their local copies and ensure that their copies all converge and stay consistent here's another example let's say both users start with cat again and anyone delete the first character they send a delete operation with at position 1 and they send an operation to the central server what happens here is we have this extra delete that happens the characters already been deleted but the other user doesn't know that yet so they delete the delete another character and while we get the same document State on both sides we've lost data and that's bad it's really bad so this is the basic summary of our problem we've now seen it a collaborative text editor cannot just work like a basic editor and the reason is it because it doesn't satisfy two key properties so the first is commutativity which means that no matter what order operations are applied in you should result you should end up with the same result so to take numbers for example addition is commutative because a plus B equals B plus a the subtraction is not and in idempotency which means that if you apply an operation repeatedly you should get the same result so for numbers again if you multiply a number by one repeatedly you'll still get that same result so now we're going to talk about how we went about solving this problem one possible solution is called operational transformation I'm going to call it OT for short here it's purely an algorithmic solution to the problem and the way it kind of works is it sits in the middle between all the operations and as they're coming through it decides if the operation can be applied as is or if it needs to be transformed or modified before being applied so let's see how this works here as those operations are on their way to the other users through that central server ot intervenes and says ok since user 1 has inserted a new character I need to transform that operation so it goes ahead and transforms that before it applies it the other operation coming through is fine it can apply it as is and we get cat caught on both sides the other example is even simpler if the operation comes through and it's the same operation it just skips it there's no need to apply it so both users get a t as they want it so when theory OT works provides an effective strategy for resolving conflicts between different copies of a document and it was actually the first popular way that was used to solve this problem Google Wave which was a precursor to Google Docs and fire pad and etherpad which are very popular open source collaborators all use OT to solve this problem the conclusion however is that it's very difficult to implement in practice here's a quote from one of the engineers of Google Wave unfortunately implementing ot sucks there's a million algorithms with different trade-offs mostly trapped in academic papers the algorithms are really hard in time-consuming to implement correctly wave took two years to write improve and if we rewrote it today would take almost as long to write a second time so it's very difficult and it seemed like the difficulty stems from its purely algorithmic approach fortunately there's a there's an easier way that we were able to use so as researchers tried to simplify ot they discovered a new method called conflict-free replicated data types in a nutshell the insight is you don't have to treat characters like I described on that basic text editor is just a value and a position you can kind of represent them as more complex data types add more properties to the data structure and the trade offer though the benefit you get is that the algorithm is much much simpler so 30t is just a strategy it's been used across different types of distributed systems I'm gonna focus on how it works for collaborative text editing so to achieve that commutativity and item potency that we wanted there's two key properties that we need to solve satisfy we need all our characters to be globally unique and we need all our characters to be globally ordered so the for the first one to make them globally ordered we assign site ID which a unique side ID and site counter properties to every character that we created and the key is that if you increment that site counter whenever you create a new character then you ensure that every character in the entire system is globally unique so let's see how this changes our early example so for the purpose of example I've simplified the kind of global IDs to be just integers the keys that they're unique and now when you do an insert or delete when you do delete you don't have to you don't delete by the position you delete by the global ID and that way when that delete operation gets sent to the other users there's no way they can delete that same character because it doesn't exist there's only one version of that ID so we've got that item potency that we wanted now before I talk about that second property I want to quickly review this so the point I made here was that when you're inserting or deleting a character in a basic text editor you have to you potentially have to shift every other characters position and that kind of leads to our operations being order dependent we can avoid this by using what's called fractional indices as opposed to numerical indices so the key is here that when a user wants to insert an O between the C and the T a position one it doesn't actually care that it's a position one it just wants to have it just wants to be between the C and the a and so we can insert it anywhere between the CA ne between anywhere between 0 and 1 let's say 0.5 and we never have to shift the surrounding characters so we're inserting at a fractional point and so we don't have to move any other characters positions the trade-off you make is that that position is no longer just an integer it's a list of values now so it can grow in size so the trade off this space that you're making there there's a lot of academic papers that talk about how to you know reduce the growth of this trade off we use an ID an adaptive ID allocation strategy called L SEC which does a good job of minimizing the growth of space with collaborative text editing so let's see how this changes our other example so now when you insert a character between the C and the a it's gonna use a fractional indices so it's gonna insert a 2.5 in this example so the key is that no matter what happens on the other users side it doesn't affect the fact that the oh is going to be placed between the CM a it's always going to be in the same position on all users documents and the order of operations no longer matters so some things up while a basic text editor and operational transformation treated our characters as just you know the valid character value and additional index we now the more complex data type for a character objects we have a side ID which is a u UID a unique user ID that we generate site counter we have that same value and we have a position identifier that supports fractional indices so I want to revisit the application architecture now initially I said we started with the central server and that was used primarily to relay operations between our users but as we got our collaborative editor working we wondered if we could change this to improve our application before I do that want to talk about some limitations with our current design the first one is that there's an unnecessarily high latency between our users so since all operations are routed through that server let's say if you have two users sitting in California while the server is out in New York even though they're just milliseconds apart in distance every operation they send has to go through that server and that slows things down a lot the second limitation is it can be costly to scale this kind of architecture as the number of users grows the resources you need increases and the cost will go up accordingly the third limitation is that you have to trust that central server you got a trust application developers us in this case whoever hosts the application Heroku in this case and potentially the government with your data and finally I realized on a central server creates a single point of failure so if that server were to go down our users can't collaborate so we can remove these limitations by using a peer-to-peer architecture instead so in this case all of our peers act as both the client and the server and they're responsible for relaying operations that they receive kind of I think they call this the Gossip protocol to send our messages between users we use a protocol called WebRTC it was primarily designed for plug-in free audio and video Colleen so you can do browser video and audio calling without installing plugins we were able to use it because it also has a pretty cool data channel that allows you to just send text messages between browsers and so that's what we used the important thing to note is that there is still a small server that's required for WebRTC in space it's a call to signaling server and it's required to initiate connections between users but it's important to note that even though that server is required no user data has to travel through that it's just for the initiation of the connection so users can find out where each other is located and after that it's actually no longer needed so this is a super basic representation of what happens in signaling when the user first opens the application it registers with the server through a WebSocket connection and pretty much tells the server works locate its IP address with some other information the signaling server assigns a unique peer ID to that user we use that peer ID in our sharing link so since that peer ID is unique our sharing link is unique as I said before it allows us to provide like a user can provide their sharing link to another user when the user clicks that sharing link to automatically connect to that user so it's essentially up kind of a pointer to each user so in the real world most users IP addresses are not publicly available so one thing we use is pretty common in these kinds of applications like VoIP and any kind of direct connection between people stun servers so let's say you go to Google and you type in what is my IP it'll tell you what your IP is it's using probably pretty much a stun server to send a signal and get back your public facing IP address the second thing that we use that's a little different is a turn server so in case users cannot connect for whatever reason and it can establish web RTC connection they can fall back on a turn server which pretty much relays operations like our central server had been doing it's actually implements the stuff I talked about we use PJs which is a really cool library that takes care of a lot of the signaling and initiating connections behind the scenes for us and the last thing to note is that the data channel WebRTC relies on the UDP protocol which means that unlike TCP the order of operation the order of delivery is not guaranteed so your messages might arrive in a different order than they were sent this can cause a problem so let's say you have three peers collaborating on one document two of them are right next to each other and the one is much further away and we have peer 1 insert an A in their document since peer 2 is nearby it quickly receives that operation but decided doesn't like it so it deletes it and then sends that operation out so now the insert and the delete are on their way to peer 3 the question is what happens if the delete arrives before the insert we can't apply to delete first because there be nothing to delete and when the insert did arrive peer 3 would have a different document to be there too so our whole our whole plan of ceará duties would just stop working so to solve this we built what's called a version vector it sounds fancy but it's pretty simple it just ends up tracking the operations that appear has received from all of their users so when an operation is broadcast out in addition to the type of operation which is an insert or delete and the character object which is what I described in the first part it also it also sent the version and as part of that version it sends the site ID and site counter which are those those properties that make a character in operation unique so let's go back adding our version vector to the to the to the diagram here so this one is Pier threes version vector it's saying that I'm connected to two piers pier 1 and pier to the latest operation I received from Pier 1 is 23rd and the latest operation I've ever seen for pier 2 is the sixth operation so now when pier 3 first receives that delete operation we don't apply it we first place it in a buffer a deletion buffer if it was an insert we could apply it immediately but because it's a delete we have to wait at the end of every operation we go ahead and process that buffer whether it's an insert or delete we processed that buffer so pier 3 looks in their buffer and tries to process it the first character to be deleted is that site 1 with the counter of 24 it looks in its version vector it sees the latest operation I received is 23 so I'm not ready to delete this I'm gonna keep it in there after some more time that insert finally arrives at pier 3 this time the buffer is processed again the version vector was updated and now pr3 knows I can delete that character and now all our piers are in the same consistent state like we wanted so most of what I just described is laid out in this code snippet here as you can see if the operations insert we can apply it immediately if it's delete we throw it in the buffer and at the end of every operation we process that buffer and then at the end of every operation also we relay that operation out through the gossip protocol to all the other piers the one new thing here is this guard clause at the top so we can also use the version vector because in the gossip protocol because multiple peers are connected to other peers you might receive an operation that you already applied and so we can use that version vector to see if that's the case and if it is we can skip that operation so with that here's our final architecture within every instance of our application we have the CR DT in the version vector which works together to make sure our document states stay consistent we have our messenger which is responsible for sending and receiving web RTC messages the one piece I haven't talked about is our editor that allows our user to interact with their CR DT data structure we use the code mirror library to do that and it pretty much just indicates the cursor position when a user makes changes to their document since all this is since it's a peer-to-peer application all the logic has to happen in the browser so it's all client-side JavaScript we do have a small node server in the back that serves the assets the JavaScript files HTML CSS but it's mostly heavy client-side JavaScript so as we began to actually use conclave in practice we found a lot of things we can improve so I'm gonna talk a little bit about that what we did there to improve it the first thing we noticed is our editor started to really slow down as the document grew in size that was pretty clearly because we were storing all of our character objects in one giant array and as that document grew in size splicing in and out of that write took a lot longer it's all OVA nations so are four major operations which are local inserts and deletes and a remote inserts and deletes all have to supply some of that one giant array so it's slow we also realized that the editor library we were using was indicating cursor position with the line and a character value so for instance if we wanted to insert another character after that T and document there the character cursor position would be line two character eight but we were transferring we were converting that to a 1d index to 30 so that we could splice it into our array and then having to do that conversion in Reverse on the way back so we could with this discovery we could we figured out pretty obviously that we could improve our performance by storing our character objects the same way that our editor was essentially with lines and characters and theme each lines with an array of arrays and with that we could have put me in fact eliminate that conversion and also we're no longer dependent on our performance is no longer dependent on the total number of characters in the document and more dependent on the number of characters in a specific line that we're inserting or deleting for remote operations we do have to find the line that that character exists in so that binary search adds a log of L factor there but it's it's a lot faster than than the o of n that we had before we wrote some performance tests on our CR DT to see how much faster as you can see it's pretty it's quite a bit faster the units are small here but for our rate for our one giant array for a hundred thousand operations on our co duty it took about 14 seconds and with array of arrays one hundred thousand operations is down to less than a second so it's a huge improvement at least on the level of our co duty now this doesn't take into a fact our editor it's just the co DT data structure but still a lot a lot better our other major opposite optimizations had to do with how we managed our peer-to-peer connections so while WebRTC allows users to connect directly the developer is still responsible for managing and distributing those connections when I use the word network here what I mean is I'm referring to the network of users collaborating on a specific document so five peers were collaborating on a document that's the network of users for that document so really early problem we ran into is we'd have three you just collaborate on a document we'd have peer one who would send out their sharing link to two other users pure to impure three would click that sharing link and connect to peer one and so the way peer 2 and P 3 are connected was not directly but through peer one so the obvious question is what happens if peer one leaves P 2 and P R three like that central server a single point of failure can no longer collaborate so our solution was to create a way for our peers to discover each other and we do that through a network list so while every user might not be connected to every other user in the network they are aware of that user they are aware of every other user and they have access to that list so now adding a network list for every user when peer one leaves they get crossed off that network list and P R 2 and P 3 can then look to see who else is in the network list use it to connect and continue collaborating so this works and with using this case we'll always have an ability for if there's users in the network users to discover each other and continue collaborating well we wondered if we could proactively avoid this type of failure because there is that that time where users aren't connected we want to avoid that time so in our initial design I talked about how when a pier opens a document they get a sharing link and they send that sharing link out to a number of users so a typical use case might be you know you have a zoom meeting and you want to collaboratively edit some meeting notes so the organizer of the meeting sets up the meeting also sets up a conclave instance and sends that link out to the users they all click that link and they all connect to peer 1 the problem here is that with this use case our structure starts to look like that central server structure that we're trying to avoid with that single point of failure so we're trying to move away from this so to avoid this we added an evaluation step so when another peer would connect instead of automatically connecting to that peer that they were pointing to we have them kind of evaluate that request so peer 1 will evaluate the request and see if what it's reached what we call maximum connections at the moment our check is pretty naive and mostly a factor of what we found to work in practice so right now if the number of connections is greater than 5 or more than half the network we forward that along to another user so in this case peer one sees that it's already connected to more than half the network it's gonna Ford it along to another user peer for grabs that request it sees that it hat it doesn't have too many connections and it accepts that connection so in a way we've started to kind of better distribute our network it's not perfect yet but it's a lot better than one single point of failure alright so now it's time for a demo okay so traditionally I mean typically what we want you to do is open up Conclave we've got your sharing link here you can click this you copy that URL and you can send that to someone whether it be a slack zoom however however email however you want to do that in this case I'm just going to open it up and it's in the same tab in a new tab and I'll move this over here and as you can see pretty quickly there another peer was found they connected pretty quickly there so these are two peers connected in one network and we've automatically assigned an animal name and a color to them so I'm the peafowl I could say hi dog and then coming over here and now we have all the basic operations here so you know you could cut come over here cut and paste and then let's say what a third user one of the join and as you can see here we already have text that's been created for that user I mean for this document so what's going to happen there there's our initial Lodhi and it happened pretty fast but they were sent all the existing document and they're able to collaborate from that same point so all three users are in the same document state we also have some other neat features one cool thing is remote cursors as you might have seen you can keep track of who's who's wearing the document we also have video calling which is pretty simple to implement once you have web RTC setup so you can place calls and accept them quickly wrapping up here has anyone heard of github teletype one person - ok so teletype is an atom plugin that github released in November right as we were wrapping up development and it does pretty much the same thing obviously it's on a web application it's an atom plug-in the cool thing is though it uses pretty much the same technology that we used he uses CR DT to resolve conflicts and uses WebRTC to send data between users so that was pretty cool to see if you want to try editor out we have a browser embeddable NPM package so you can set that up in your own application future plans as I talked about our optimizations are mostly done from manual testing you know we'd have the three of us have documents and on each other's computers or we'd have multiple tabs on one of our computers that's pretty limiting you know we can't we can't we're not able to scale up and test tons of users we're not able to scale up and test different levels of latency or users in different geographic locations so if we could automate testing we'd be able to do that and kind of really push conclave and see how far it really can go the other thing is up we want to optimize mass insertions and deletions right now our CEO TT handles operations one at a time so when you let's say you like I did that cut and paste with multiple characters our CEO T is still processing that one at a time so if I were to try to cut if I were to try to cut you know a hundred characters it would take a long time to process those characters and it might hang a little bit and the last thing is replacing pjs that library we were using it's a great library it made things a lot easier but it's not been maintained no longer so for instance Safari got WebRTC support in the last year period is still doesn't support Safari so our app does not support Safari right now couple more things I kept are present my presentation a little bit of a higher level if you want to learn more and dive deeper into how things work we've wrote up a case study that's available at that link also if you want to contact me I'm getting savate I'm currently looking for a full time work so if you work at a cool company and you think I might be a good fit I love love to talk to you so come find me after thank you [Applause]

Info

Channel: JavaScriptMN

Views: 7,515

Rating: undefined out of 5

Keywords: javascript, real-time programming, javascriptmn, webrtc

Id: hy0ePbpna5Y

Channel Id: undefined

Length: 28min 11sec (1691 seconds)

Published: Tue Feb 06 2018