Speakeasy JS – Deno security (Ryan Dahl) & Content-addressed distributed data structures (Rod Vagg)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] hello everyone welcome we're uh here for another edition of speakeasy.js uh i'm feroz and we have here uh all of our panelists for this week as well as the speakers who are going to be uh sharing their talks with us so um i don't know everyone just say hi or say whatever you want uh go around maybe hello yeah cool uh we have andrew uh paul rod michael and uh our guests for today uh rod and um ryan so okay so first uh first up we're going to do rod's talk so rod is going to be speaking to us about some of this stuff uh well actually it's kind of uh i think it's gonna be an introduction to content address distributed data structures but he's been working at protocol labs recently on uh ipld and a whole bunch of interesting peer-to-peer primitives so i'm really glad that he's doing this talk and going to be uh giving us all a good schooling on this kind of stuff so go ahead brad take it away yeah so uh i'm my mic is working i assume yep oh good okay so what i'm doing today is um there's a bunch of things i thought i could talk about but they all feel like they need background because they just jump straight into this whole idea of content address storage content addressing of data and i just don't think that that's a generally understood concept in our industry um like it should be so i figure the best way to deal with that is to back right up to the beginning and just talk about the basics and i know a lot of people in here this is going to be um really basic stuff this is nothing fancy this is not mad science this is really basic stuff so i'm just trying to cover some introductory stuff but um yeah i'd love to get some feedback as we go if there's questions coming in feel free to interrupt me for us if they come in live or anyone else in the who's watching in the in the room um but yeah let's get into it and look at content addressing and data structures and how these things fit together okay content addressing um what is it well the the content dressing basically means the address of a piece of content is derived from the content itself um and it's typically a hash digest so you take the content uh which is some random array of bytes you know whatever it could be it could be text or it could be something else video you run it through a hash function and that from that you derive your address so common examples of that we're all used to get git is a sha one of a change set to your repository um and so the git commit address or the hash is this sha one uh in bitcoin uh the hash function used is a double shar 2 256 over the 80 byte header of a bitcoin um block so we um we're used to seeing these bitcoin um hashes thrown around that's what that is so that's the address and the content is this other thing that you can refer to so why would you do that so there's a number of reasons and the main ones are first of all it's secure because uh what it gives you is authentication automatic authentication built into the addressing so the address authenticates the content so therefore i can trust i don't i don't even need to trust you if you're giving me the content that i've asked for by the address if i say i've got this hash that addresses a bitcoin block can you give me the the data if i know how to run the hash function then i can authenticate the data is what i asked for so you get this free authentication that comes with it it's immutable can't change you i i don't have to worry that i've got a a news website and um you know that i give it i give the address to someone else to have a look at uh some news article and they go there and the news article's completely changed it's you know i i know that the address will always give the same thing and if i need to refer to a new version of it i need a new address and the other thing that's a little bit more subtle is you get deduplication so because the hash function will output the same address for the same data you get to there's a lot of places where you get to say i don't need to store or refer to this same thing again because i already have it or it's the same as something that already exists i won't go into that more but it might come out in some of the later discussion another term that you hear in this space is merkle trees um the merkle trees comes from the guy that uh coined ralph merkel uh it came from a patent in 1979. uh and the burkle tree basically says that the content being hashed may also contain hash digests of other content so therefore any content address which authenticates content which is uh it automatically authenticates content which is linked by the inclusion of their digest in the tree below it so if you look at the the little figure from the patent you'll see down the bottom you've got some raw data then you're hashing chunks of it and you're appending hash digest together and then hashing those and passing that all the way up in this binary tree to the top where you've got one hash and then that hash authenticates each of those chunks individually as being part of the hash now uh so uh merkle trees are also known as authentication trees hash trees and you'll see later that this extends all the way to the concept of a blockchain it's important to note here that a merkle tree isn't strictly doesn't strictly refer to this um prepending appending of hashes together a merkle tree is simply the inclusion of a hash in some data so it's just by linking to including a hash in your content and then hashing that you're creating a merkle tree so they don't have to be this strict binary form they don't have to be just include hashes in these intermediate nodes but you will often see this pattern as we'll get into some examples another term uh and this is this came up a few weeks ago when michael did his talk on uh on zdag his block format dag directed acyclic graph uh this is from uh directly from graph theory it's not specifically about content addressing but we apply it what it means is a dag is uh is directed which means that the linkages between the nodes flow in a particular direction so in in non-directed graphs there's no directionality to the linkages but in a dag there's there's a directionality an acyclic means that there can be no cycles in the graph so the linkages flow one way and there's no loops and we can see the merkle tree or these merkle structures they're dags because a hash function is directional you there is a you can't reverse a hashed function that's one of the points of them so you have the hash of the data points to the data there's a directionality and there's no cycles you can't uh link to um content that doesn't already exist so um you can't you can't create a hash for something that you don't already have so you there's this time component to it as well so um a merkle tree sometimes called a merkel dag these terms are often mixed up together let's look at how you might use this practically a good example is that you'll see in the wild a lot is file systems and this is you'll see this in ipfs and i'll show you another example straight after this as well where it's used but let's say we want to build a file system of distributed chunks and let's say i my files are these distributed pieces and i've got eight files here and i hash all of the files and i get the little addresses for them then i include those addresses these little hash digests in these directory chunks and the directory might contain names of files it might contain the name of the directory itself whatever it is it's just some some kind of blob of data that has those hashes in it as well as some other metadata that identifies as a directory and in this example i've got a directory one is pointing to directory2 which is also pointing to files so i've got a multi-level directory structure here so everything in this graph gets a address or a link um so there's ten pieces here there's eight files and two directories um and you can point to any of them individually as well you don't just have to point to the directories you could just pull out a file and you've got a link to that as well because they've all got their own addresses here's one way where that structure is used it's used in git and git adds this extra layer of these commits above it these commits are data structures that it layers on top of files so and and git is a merkle dag as well it's got the same things going on it uses sha-1 addresses and everything in here has a sha-1 this is a very simplified version of kit by the way but down the bottom we've got these blobs which are generally files in our git repository and they bubble up into trees and trees can also point to other trees um so and then you get your directory structure essentially and then the trees those things are hashed as well and they they put into these commits which we're where we get our little commit strings from and the commit contains the author the committer timestamps the message uh and then it points to the tree data but it also points to this parent which is the previous commit that you're building on top of so you build this directional graph and everything's hashed so it's merkle dag and in this example this example um starting from the right hand side that's an original commit there's four files in a just a plain git repo next commit uh introduces two more files but it still refers to the original four then the next commit introduces a directory structure it removes files it adds new ones and it makes the git repository more complicated but each of these things again has their own sha-1 address this is a merkle dag there's directionality this is what we're talking about with content addressing another example is a blockchain bitcoin in particular is an easy one to talk about because it's uh so many of so many blockchains are just forks of this code and they use the same structure ethereum's a different base and some of the newer blockchains are as well but this is this is a simplified version of bitcoin uh a when bitcoin first or block they actually refer to this data structure that contains multiple independently hashed things um the thing that you where you have a bitcoin block address is actually of the hash of a header 80 bytes of header in that header there's also there's a hash inclusion that points to this thing called a tx merkle root and that is this binary merkle tree like we saw in the in the merkle patent where it's actually just appending hashes together to form this strict binary tree and at the at the leaves of this are hashes that point to the transactions in the in the thing itself um and the header also points to the previous block and that's where we get the block chain from so the block chain is actually headers pointing to headers all the way back to the genesis block and within the headers themselves they point to their own transactions so they're all bundled up um one of the interesting things about bitcoin is that this binary merkle tree structure on the side there is actually not included in the bytes that are shipped around between bitcoin nodes they only include the transactions uh and the reason for that is they can derive that binary merkle tree to prove that it exists um so they might the header includes this merkle root but if you've got the transactions you can generate that by regenerating this minor michael tree so still again it's a dag it's a merkle tree same thing backing up to our example the the directory one this concept of a root is really important a root is a single thing that you can hold on to the points to this all of the things that you want in this case we've got directory one is our root it points down to um these nine other pieces it can and this one reference can can refer to an entire graph we could also choose directory tools that two is our one reference we could hold on to that and that's all we care about but then we'd only get the four files underneath it we could create a new one and hold on to that and maybe that one points the directory to doesn't point to directory one that's sort of lost to us somewhere else but we add two more files um and we can actually include mutability in this because now i said in the beginning that immutability was one of the the properties of this but um if you include uh mutability um you can well it's immutability of a sort and i'll show you what i mean by that you if you hold on to a root that points to your data structure and you keep this root as the important concept if you change the root over time then you get a kind of mutability so in this example let's say i want to change uh file one maybe i edit file one and put some new bytes in it and that changes the hash that hash then bubbles up to the directory one blob and that needs to change but if i just create a new directory one with a new hash then i've got a new root and that points to this new graph where i've got the edited file one but the rest of the same um now i could also hold on to the original route if i wanted to and one of the interesting properties of that is that you get these snapshots over time that i could actually walk back in time if i had access to all of these pieces these blobs if they were stored in my store somewhere immutability extends to all the kinds of operations you could do with data structures let's say you've got some kind of maybe this is a b-tree data structure and you know you've got some predictable layout and you want to perform some operations on it the root is the thing i'm hanging on to and i can navigate from the root to everywhere using some specific algorithm let's say i want to add elements here remove elements you know shuffle things around i can do a bunch of operations generate a new route and you can see here i've got snapshotting so i could walk back in time to the original route um but i've also got the ability to see all the all the operations i've mentioned there i've been add things remove things um and um what was i going to say about that i forgot what i was going to say about that was an important point there that i set my mind but basically um i if i've got predictable algorithms i can run across these things i can mutate these graphs hold on to the root uh i could discard the old root and maybe garbage collect the elements that i don't care about anymore that really depends on the system i'm working within so algorithms for working with these things so this is a this is a quite a complex area because we're dealing with really complex systems especially when we're dealing with distributed systems because the nature of the systems come into play so how do we choose or design algorithms well we need to ask questions like is mutability needed do we need to be able to mutate this thing over time or we just want to create it up front and then ship it to the world maybe i have a video file i want to package up into blocks and ship it to the world and that's all i care about so maybe mutability is not needed or maybe it is maybe i want to do a distributed package manager and want to be updated over time so i need to ship these new routes around all the time what does that look like um what other operations do i need to to run um these things um traversal how do i need to how do i want to get through my graphs do i do i need to do sorted traversal do i need to do iteration over the whole lot or do i just want to go to individual ones do i need to do size do i need to understand the size of these things so got to understand the priorities to get to these algorithms how much data should you fit into a block these singles things things that are hashed then that's going to be very dependent on the system because the the nature of the storage and distribution medium will impact all of these decisions so maybe i'm using ipfs and there i've got a recommended maximum size of a meg but very commonly you'll have much smaller than a meg as well and you want to ship these pieces around and the size of the block will impact a whole bunch of other things this stuff's not new though so distributed systems you know fairly new but these algorithms are not new at all because persistent and immutable data structures have been around for quite a long time and these things have been researched we find them a lot in functional programming languages uh and functional programming libraries that you can use in non-functional languages standard libraries of scala closure haskell etc they're all full of data structures that would translate almost directly into this world things like hamps fit directly in here and you can create distributed versions of them using the same algorithms um but the algorithm selection requires careful consideration of all the trade-offs that i mentioned as well so there's this whack-a-mole you have to play with what am i what is the nature of the system i'm using and what are the kinds of operations i want to perform on it um to figure out what algorithms you want am i over time for us should i stop here because i do have an example i wanted to get into yeah you have time okay let's let's i've got a few slides here just to quickly build out an example um let's say i want to build a multi-block content address super large array now let's say i've got data that i wanted to live in content addressed land and this thing could be arbitrarily large perhaps it's so large i couldn't fit it in memory in my own computer perhaps it's so large i couldn't even fit it on my own disk so it's going to live out there in this content-addressed world maybe on ipfs maybe i'm putting these blocks into s3 maybe that's all i'm doing but they're out there somewhere and i just want to be able to navigate through this array and store things in it um now it i want it to be generic uh the encoding form doesn't matter like this could be json this could be cbor this could be zdag as long as i can hash these blocks i can use hashes as the links and i can store and load this data with hashes so if i've got an operation where i can fetch by hash and i can store by hash as well that's all i care about now the leaf elements in this array they could be anything maybe they're just simple integers maybe they're complex objects or maybe they are hashes themselves maybe i want an array of hashes because i'm building npm i don't know so let's just build a super large array algorithm this is just quickly um let's start off with a block and define a maximum width so i know how big these blocks should get up to and um so here i'm just i'm using json here as illustration so these blocks are just json arrays and i'm defining a width of about five right so um if i store an array of four things i can hash that that's my root i have a root that is single block great i can now go into my block fetch out elements from that block that's easy um and i can do it up to this maximum width so up to five elements in my little block here and the simple case one block if i mutate that block remove add elements then i get a new root address but as you can see if i if i was to remove element five from that second version of the uh array then i should actually get back the same hash as the first version so it's like stepping back in time even though it performed a removal so that's one of the nice properties of of content addressing okay so let's add more elements what happens when we add more elements if we go beyond the fixed width let's add a new route that refers to the to an additional block so overflowing our maximum width says that we have to make a new block to start storing elements in and that new block can be up to the same width before it overflows and then each of those blocks gets hashed and we store those hash addresses in a new root block and the new root block has the same rules as well it's got a maximum width as well of 5. so let's call this a height of 2 now so height at height level 1 are the actual elements height level 2 is this root block that has links to the height 1 blocks now if you think about it you you can see that if our height 1 is constrained to 5 elements then at height 2 we have a maximum capacity in our array of 25 elements because then we can fit five blocks of five before we overflow at our height to single block but that's fine because we can overflow that as well so there's our maximum capacity height 2 data structure single root block pointing to five different things if we overflow that by adding a new element we add a new height and that new height refers back down to the height number two so we've got height three here that refers to height two which refers to height one so at a height of three with the five we have this new capacity of 125 before we overflow our height um so we can just keep on adding things to this thing and it goes outwards and upwards um right so traversal is interesting so we want to say we want we've got this root we've got this reference to this root and we want to traverse to index 21. um how do we do that we need an algorithm that can where we can enter in the root and then figure out where to go at each point in this graph the algorithm is not too hard but and there's some javascript to illustrate it basically what you're doing is you're slicing off bits from the index at each height and to do that we can do division and and modulo operations so if we enter if we each each node needs to know what height it's at and it needs to know how to get to child elements if it's not at the height one so if we uh have this root and it's height two and we say get me index number twenty one then we run this little algorithm and says okay i'm not a height one so i have child elements i need to figure out which child element to call into so you run uh you slice off some bits and you say i want uh which child index you can run this division i'm not going to go into the details here but basically i get the index of my child element and then i get a new index to pass that child element so in this example child element would be one would be the first element it's actually zero and then the new index for that one would actually still be 21 because i'm not slicing off any any bits but um i pass into that that child element and say i want you to get me index number 21. but if i was to 26 then it would be a different path so basically this is just an algorithm to traverse from the root to any node and you need more algorithms too you need things like size if i want to know how big this thing is there's an algorithm to get size of it if i want to delete elements it's a whole other algorithm there just quickly the properties of of this array this is this is just an example this is not i'm not saying this is a an array that would be good to use in every situation this is an array that might be useful for some situations but we need to understand what the properties are of the various algorithms to in order to judge whether it's useful and also how to tune it so we can see that the width means we would have larger blocks and fewer levels which might be good maybe i want to have blocks up to a one meg which would mean quite large um but then you can see the mutation could be expensive because i'm garbage collecting um quite large blocks but then i have fewer levels to navigate maybe i've got network latency in weighing into this as well so there's a whole bunch of reasons you might tune that width to suit your environment maybe all these blocks are local so small width is great the get and size operations are pretty efficient they just need to traverse um down to height one single blocks and and you can get the answers to these operations appending data is pretty efficient too it only requires mutating a maximum of one node at each level and and there are some cases where you don't even need to do that at some levels um so if you're at the boundary case you know you're just adding nodes um ordered iteration is really simple it's just a left to right tree reversal this thing just folds out into a normal tree structure you just do left to right traversal and that's your ordered array iteration slicing this array is possible but it's only efficient if you perform it at the boundaries of your width of your block so if i was to slice at boundaries of five great if i was to do anything else then i'm i'm rewriting everything to create these a new version of this thing that would fit into the new structure prepending data putting at the front is really costly there's the algorithms to do that can work out if again if you're dealing with boundaries but very costly because you end up rewriting everything and mutating the whole structure deleting elements other than that the tail is very costly as well because you're shuffling things up rewriting mutating data and lastly it can't handle sparse data you'd have for lots of blanks it'll be very wasteful uh there's a bunch more properties to this um we actually have a spec for this we call it vector in ipld if you go to the ipld specs repo and github there's a whole write up on the various algorithms to traverse and mutate this data structure very interesting there there's javascript implementation of it and there's a go implementation of it you can have a look at as well if you're interested but that's all i have hopefully that quick example was illustrative and hopefully i haven't used up too much time no it's all good thanks a lot rod that was amazing um i wish we had an applause track or something to play right now because it's it's not as it's not the same in a virtual uh end of a talk but uh yeah i also wish that you went first uh earlier than all these other talks we had because you really laid out a lot of uh of the basic primitives in a way that uh makes the latest the other talks easier to understand i think a lot of times people don't um well people can jump into like the more interesting advanced stuff about whatever they've built and if you're watching a talk like that it can be really confusing and you know it could be like well am i missing something here where's the context and there's just not enough time in like in like a 15-minute talk where you're introducing like whatever you know michael is introducing zbag and like uh you can't also explain what a hash function is and all these other things so um yeah really great introduction uh thanks for doing that that's that's that's why i put this together because i was listening to michael and also um mikola and uh the talks about hyper core and all the rest of them they're all the same sort of things where you you sort of need a little bit of background at least and i know when i started getting into this i was i needed all this context so hopefully that was helpful yeah totally i really had diagrams um i thought that the you know color coding them and then being able to go you know step slide by slide through these similar data structures i thought it was really effective to kind of understand all the similarities so i really appreciated that yeah so uh uh we don't have any questions right now from the chat we just had somebody uh somebody mentioned that uh that uh it wasn't really a question it was more of a comment that the trick of shifting the root node um to another root node whenever you change one of the lower uh you know the leaf nodes um gives you like a fully functional data structure and so content adjustability can really help and i think you mentioned that a little bit um but that's super cool oh there's there's also this whole new field that is really um really going mad in um in research area at the moment research areas which is these conflict free data structures and a lot of those touch on the same kinds of areas because immutability functional data structures those things play into the same strengths so there's tons of research going on for massively parallel computing and distributed computing that where this stuff just overlaps there was one question on the chat that i thought would be good um they're asking how you how does index and uh addressing uh still maintain the properties of the content hash addressing of immutability in this context because as you were describing the algorithm it was an index which you were um using so could you elaborate on that particular question right so in my example um i was relying on the fact that in the blobs each of the each of the blocks was storing an array and i could index a block so if i load a block and and i use json there is the best way to do this because i think we all a lot of us think in json when we're talking about serialization um just if i store a json array and put that and say that is my block and i'm storing that maybe i put it on s3 if i load that i can index into that block and at all of the height one elements those indexes are actual end elements but i'm still only got indexes of zero to five at every block and at the other heights at all of my elements are addresses and so the algorithm really is to say if i'm coming in at the top and i know the height how do i modify my index such that i can i know which element to look up in in this level and then which el which element to pass on to the next level because you're you're really dividing your graph as you traverse down um and you want to know um how do i maintain this this index so that it gets to the right point and what you end up with is this this fascinating way where the index will change through the navigation but you as a user don't need to care about that but the algorithm will take care of um changing the index as it goes down through this data structure and your passes on through the nodes um if you're interested in look up the spec because there's some interesting algorithmic properties here this is super cool it's i'm getting flashbacks to when i took operating systems in college uh when you have a an inode and it's like an i know number and you have to like go to the indirectly or doubly indirect block and then like go to an indirect block and then finally find where the data is uh it sounds like really tricky uh a bitmap that you have to like debug to get it right yeah well they said one of the talks i wanted to give was on was do a talk on hamps because hamps are really fascinating data structures here um and there's so much interesting math and bit shifting that goes on in that maybe we'll do a talk on that later but um same kind of thing where you you're navigating down and you're mutating these indexes and looking things up and it's fascinating simple math going on super cool i wish we had time for more questions but we got to move on now to the next talk uh which is going to be from ryan doll and uh those who don't don't know ryan uh made uh node.js and uh now he's working on dino and i'm super hyped to hear what he's gonna talk about today so ryan uh go ahead take it away stage is yours uh i think you're on mute though can you hear me now yeah we can cool uh hello hi um so i'm going to switch screens here can you guys see this okay i've just got a laptop so i can't see anything other than this um yeah uh michael asked me to give me give a talk about um the security model in in dino um so i just thought i'd give a quick overview and some demos to kind of explain it i think it's pretty straightforward uh but you know if anybody has any questions feel free to interrupt me i guess you have to interrupt me audio audioly audioly that uh because this is all i can only see my screen right now um so right dino is a runtime for javascript um it's a project i reluctantly got into a couple of years ago you know when you when you've worked on node for as long as i did you can't help but kind of think about how how things could be differently and yeah dino kind of started out as as my way of playing around uh with um how we might be able to make a really sim simple scripting system um yeah i i i'm i like scripting languages a lot i end up programming most of my days in in rust these days but you know it's really nice to be able to just throw down some scripts sometimes you know hack something together rename some files and yeah i'm just really unsatisfied with with the state of the world when it comes to scripting languages uh you know python is okay but it's not the same thing that works in the web browser and it's quite slow and lua is one indexed so totally ridiculous uh you know we've got ruby but it's ridiculously slow too and it just seems like javascript i mean you know if there's going to be one scripting language in the world it might as well be javascript um and uh yeah this is this is just kind of my attempt to to make make a little bit of tooling that that i could use um anyway oh wait let's see here oh god i have to exit this okay so i'm going to show you some some examples here uh and we'll talk about the security implications as we go through um so let me just copy this into my command line so so okay wait before i do that um so first of all bino is a program that can you know run on your it's a command line utility uh and what's cool about it is it can actually run like raw urls so you don't actually have to to download anything oh my god where's that uh let's open oh wait this is a link to the github page but not the actual source code let's grab let's grab the actual source code here so uh this is a little file it kind of implements the the unix uh cat cat program where you can give it a file name and it uh prints it out to to standard out so this this just kind of like loops through the arguments of the program um so let me just uh replace that oh jesus i'm sorry guys you know run url so you might notice here that that so let me let me take out this allow read first of all um so when you run this program it's supposed to take some some argument so so actually with without any arguments uh it loops through nothing and so so actually doesn't do anything uh notice how it downloaded this this program on on the first time it ran but then didn't do anything after after that that's because it's it kind of caches things in it in a secret directory somewhere um and what you're supposed to do is like this this cap program can like cap files to standard out print print them to standard out so let's try to print my edc password file and i get an error says uncaught permission denied read access to edc password run again with allow read flag so this you know starts getting into what dino's permission model is and uh it's it's you know it's it's a program for for your for your command line right which it's trying to interact with your system but you know javascript the the v8 uh the v8 engine that we're using uh is is is actually a secure sandbox um even though we're kind of poking holes in it so that you can like you know open edc password and and kind of interact with with your with your operating system uh you know we if if we're really careful about it we can we can really uh kind of control access to the system kind of in the same way that like the web browser you know you can you can uh share your your camera you can you can have the website get access to your to your video feed but uh you have to kind of opt into that right so it i like this permissions model where you just kind of opt into stuff and so in order for this this program to actually allow be able to access this file here uh it needs allow read uh and there there we go print it printed out edc password this this allow read since we're talking about security stuff in detail can be limited to to kind of a subtree of of your file system so what i could say is like oh you can only have access to like slash temp but not slash edc so like this this should still fail i think because edc password is not inside is not inside a temp directory but if i do something like this it should work right so so you know these flags are kind of dumb but uh you know have have some have some flexibility to them not i shouldn't say dumb they're they're uh they're not very fine-grained uh permissions okay so that's that's kind of example one uh let's let's look at a second example so this this one is um and this one i'm using this url on the on the dinoland website uh wait quick quick side side note about these this dinoland website and and these urls so this is some file whatever it's called just.ts who knows what it does and when you go to it on the website you get this this pretty html page but what's really fancy is that when you do uh when you curl it you actually get the raw the raw text uh and we have we have kind of a fancy uh system where where it inspects the the um the accept header of of the of the request coming in so it knows if if you're if you're a web browser and if so you kind of displays this really nice page and otherwise uh it just gives you the raw content anyway digression um so yeah so this this program uh uh prints uh connects to the gist service the github gist service and uh allows you to put a file up there so what i'm going to try to do is just my edc password um of course like edc password it has like the shadow whatever it's it this should be secure um notice that you know we've got the allow read because it's going to need to like read whatever file so i'm i'm going to pass this thing edc password again right so it needs allow read in order to to do that but we don't just like give out network access willy-nilly here uh network access could of course you know you could could read you your ssh keys and then you could uh you know send send those out in in some sort of uh outbound request so you know we we we don't give network access by default uh and in fact by default you don't get any access at all so so you know when you when you run any dino program without an allow flag this should be you know i'm not sure if i'm willing to bet my life on it but uh you know theoretically this is this is a secure sandbox there's no way to break out of this vm uh supposedly if you could it would be a bug let's put it that way um all right so let me put in these these flags here it it needs a lot out m because uh i have a a uh a personal access token to to github so i can i can actually post as as my yeah um right so so here's the command does it look okay uh and it doesn't work it says oh i need access to api get oh okay i have the wrong um domain name here actually so right i mentioned earlier that allow read can be uh can be targeted to a certain subtree allowing that can be targeted to a certain host and i think we actually have um kind of sub paths of urls supported too but i'm just going to put api.github there so what i'm saying is is like you can you can do whatever you want network wise as long as it's always to this server and you know just.github.com not not not okay um so uh let's try that so it's it's going oh geez okay all right so so that seems to have worked uh great there's my edc password um right uh [Music] let's try something slightly more complicated here so this is this is a little program that somebody developed uh and this thing uh searches for use i'm not sure why why they built this um to be honest but it's for it's for uh searching for user names on websites i'm using it just because it's an example of a somewhat non-trivial program and what you can do is is you can run this thing directly here and i'll just do that so so when you run it uh what i'm saying is like you can't access my file system at all no way sherlock i don't know what the hell you are but uh what you can do is you can access the network and you know i'm not going to go through the i'm too lazy i can't down be bothered to download your cli ts file or whatever um i just want to try out your program and what does it say fatal error user name contains invo valid characters why is that okay insert username so i'm going to try rye which is my typical username and yeah this program does something well bloggers still a thing apparently uh yeah so you know i think i think you know this this is maybe a uh silly silly example but um kind of gives gives you a feel that that like we have these kind of uh installation free programs available and i think what's what's really kind of cool is that this is all done as as as well as we can we're trying to stick to web standards here uh why is this i i think this is this is actually a redirect and maybe i have to do l right so i'm i'm just trying to take a look at this at this file here and um you know i think if if you're familiar with node and i'm you know you guys are all extreme nerds here so i'm sure you've heard my my spiel about dino before and and the imports uh but we do we do use uh uh extensions in in the import specifiers and the module specifiers and obviously you can you can actually import fully qualified absolute urls as well you might notice that this is all typescript uh dino has uh a ton of typescript uh stuff built in it does of course support javascript good old javascript should always work uh and you know it's kind of a struggle when you support typescript to get people to care about javascript but i still care and we try to make it really uh uh we try to not not make it any slower to run javascript than if you were just using v8 directly that is it it's not going through through more compilation passes typescript by the way so complicated such a such a such a headache um okay um so anyway that's those are those are my demos uh the security flags i i demonstrated allow read and and allow m and allow net there's also allow right to write to the file system uh there's allow all if you just don't care or you're developing or whatever you just you want to run things in python mode as i like to call it uh maybe you should just call allow like python uh so allow run is very similar to allow all it is actually uh it it allows you to start a sub process of course once you start a sub process all security bets are off like that subprocess can do anything obviously that we have no control over over how some processes run uh they can they can access the network uh and so allow run and allow this was actually a long potential contentious debate about whether these two things should be the same thing allow plug-in is a new feature that is not well supported yet but we're kind of developing the ability to have plugins extension modules uh dlls whatever you want to call them uh compiled things that load into to extend dino um and there's uh allow hr time to to get around uh the the possibility of kind of spectre sort of sort of problems this is this is pretty academic uh i you know dino is a pretty new piece of software i'm not sure like i said i i'm not sure i would bet my life on on on its security uh so you know it's debatable whether whether something like that could be exploited actually um yeah um actually i kind of messed up the order of these talks so forgive me while i just remove move this around a little bit so um yeah dino is the you know the security aspect of dino is is uh was built in from the start and uh the way you know i know there's a bunch of of node people uh uh in this group um so you know i'm more more or less speaking to you guys uh but dino is organized quite quite a bit differently than than node was uh we don't allow people to just kind of bind v8 willy-nilly right you can't just create like like kind of objects in c plus plus and like uh stick them into stick them into the global scope right we we don't we actually don't allow you to create any objects in c in with with the va api there's there's actually only i'm going live a little bit but let's say there's only two real uh functions two real native functions in dino and and those are kind of send and receive basically what we have done is is we've created a really efficient way of sending array buffers back and forth between va between your javascript uh runtime and and rust and we use those those those that ability to kind of send the ray boppers back and forth to implement everything everything's done on top of this literally everything in in dino is is is built this way um which is really nice from a security perspective because everything goes through one funnel right there's there's not all of these different entry points that you need to uh worry about in in terms of security like we we can kind of have have uh security guards in in in uh in the right places and do do all sorts of other stuff you can do metrics and and stuff uh we call kind of functions that you that call this is this is some sort of rpc mechanism um we call we call function functions that uh go over this rpc mechanism ops so open for example is an op or like reader is an op and this is modeled uh very much after the the you know linux syscall and uh kind of the post posix uh syscall uh uh setup um so we we have something like file descriptors which are uh integers which refer to things allocated in rust uh resources we didn't want to call them file descriptors because we thought that would be confusing uh so there's a there's a resource table and you can you can these resources basically correspond to sockets or standard io or open files and you basically can reference them well while from from javascript as just a single integer and use them in in these ops that you're calling to to to reference uh data that that is that is more complex objects that are stored in rust um so so yeah in in the implementation of these ops is kind of where we have these these security check posts um dino uh you know the kind of i'm getting away a little bit from the the security aspect but just kind of the general design of it it's it's uh it's not as monolithic as node is there's actually three major components to the system uh there's dino the dino executable itself the cli as we call it and then there's dino core which is a you can think of it as a really slimmed down version of dino like if dino didn't have typescript if dino didn't have any file system access if it didn't have any security things i actually all it is is this ops this op system which is uh you know vaguely our our way of of taking javascript promises and turning them into rust futures like we we worked a really long time to to kind of uh make make that that connection uh usable um the this dino core actually doesn't even depend on tokyo so there's no event loop involved with dino core all it is is is kind of an abstract way of a running javascript and b you know basically binding uh promises in javascript to futures in rust but without providing any any bindings uh and then lower level than that is is rusty v8 so but v8 is written in c plus plus we're using rust we love rust rust has uh saved my life uh see i don't know i you know having programmed c plus for for many years i all i can say is that russ is is fabulous and i will never start a new project in c plus plus again anyway we had to bind this uh v8 uh library uh the c plus library to rust and this is very non-trivial very very non-trivial uh it's it's a a very large c plus plus api uh it's it's you know millions of lines of code it's it's takes an hour to compile uh and what we've done is put a lot of work into providing a rust interface that is zero overhead so you're actually like handling the exact sort of pointers that you do like literally zero overhead to to what you do in c plus plus you're literally uh you know the the the the uh the machine code that you generate should should be exactly the same as the equivalent c plus plus code uh but we do provide kind of you know russ has more uh uh type system stuff and and can can kind of specify interfaces more more uh in a more uh detailed uh uh and precise way and we're trying to provide we we basically are i wouldn't say we're 100 there but we're we're 98 the way there towards providing kind of a safe interface to this very complex system uh hard to hard to overstate the fact that v8 is a very complex system um by the way please interrupt me if i if i'm sorry i have to check my time here okay uh yeah let me just give give you guys just a brief overview of some of the other things that can do so you know we have all of these parsers and stuff built into dino so what we can do is for example given a url we can parse out documentation like this and you might notice it happens very fast yes uh our all of our parsers are are written in rust and we use this stuff in our website to provide documentation to any you know provide a url to any javascript file any any es module javascript file stick it into this box and we should be able to generate some some some documentation for you so that's built into the command line we also have a formatter very much like prettier but 100 times faster i believe a thousand times faster possibly we have lent we've just achieved uh compatibility with eslint's uh recommended typescript rules also i believe 100x faster possibly a thousand x faster much much much faster than eslin you you will think that it's not running it's it has run it's just written in rust uh we have kind of a test system built in and we have kind of coverage maybe this is worthwhile uh showing you briefly if i go into the dino directory we have i'll just show you there's there's kind of a node compatibility layer that we have you know if you if you have some code that that is is requiring fs or whatever and you want to run it in dino we have a layer to do that but let's let's just show what it looks like to run the test with coverage on this directory so i'm just going to run dino test it runs through all of the tests in there and then it prints out some code coverage stuff at the end here so it tells us which of our code bits need to be worked on a bit more so yeah we're we're kind of interested in making making this this scripting language very useful and that means kind of adding tooling uh i'm going to stop there i've got all these other slides but i feel like i've gone over time already so um you know maybe i'll just leave it there cool well thanks a lot ryan that was amazing um i i just want to say you're pretty you're pretty bold person to be playing with your etsy password file on a live stream yeah i know that it's not the shadow but still yeah it's shadowed i think that that was all solved like you know in the early 2000s so uh yeah that's funny um i just had a couple questions uh that uh came to my mind uh while you were speaking um i'm curious when you said that like the files are cached uh the first time you go to fetch a script um are you actually on later runs uh doing like a like a check with the server for if it's been modified or do like looking for e-tags or something like that uh we don't do that we we cache it forever uh so so the idea is that you know when you run it it's kind of like npm installing something uh where it's kind of cached forever uh what we have is a reload flag which is supposed to be like the web browser reload and that that will allow you to to redownload those things if if you want to do that okay interesting um that sounds like a like a good opportunity for for if you wanted to be really careful about what you reload you might even have a have a way to like look at each changed file and look at a diff between what's in the cache and what it currently is and like go file by file for something really security sensitive that's that's an interesting opportunity yeah we might do stuff i'm not sure actually but um yeah you know the idea is is that like you know sometimes you're on an airplane and you know obviously we can't we can't expect these urls to always work but you should be able to you know it's it just because we're using a url doesn't mean that we go and visit it every time like like it's just a it's a module specifier right for you know whatever is happening in in the same way that the express string is a module specifier for express on npm right i mean it still corresponds to a url and stuff it's just like you're expected to run this command beforehand interesting so one uh other question that i was wondering when when people post um their uh their scripts to you know the internet and they tell people hey run this in dino do they usually provide the flags that they recommend you run it with or um do they just do you just sort of like try uh you run it and it fails and then you try adding what's necessary as you go usually people provide provide the flags yeah okay interesting yeah i'm wondering if um uh there's any way to like ex like express up front and like almost like a manifest or a comment at the top of the file like these are the perms that we need and then the cli could uh could not give them until the user like hits y or something on each of them right right yeah we we actually had a system like that early on uh and kind of went back to this kind of throwing behavior just because it was just a bit too too buggy and there's a lot of situations where you want things to kind of just fail i think we actually uh have have that prompting ability still in there i'm not sure how to turn it on though maybe it's gone at this point but interesting huh and then one one other question that someone dm'd me privately was they were wondering um uh if i wanted to say install something like low dash or whatever you know some utility thing and i don't want to give it any permissions but i want my app to be able to talk to the network is there any way to do like an import level permission where i just give the package specif different packages different permissions uh there is not it's it's been a much much heated topic of debate i i think everybody agrees this would be fabulous and and like totally totally something that we could do but um i'm it's it's really tricky i i think it's it's i i guess i i'm not convinced that it can't be done but i'm also not convinced that it can be done securely uh so i'm kind of waiting for somebody to to do this research uh if if this is actually possible to do securely we would totally do it but i'm kind of skeptical like there's a lot of monkey patching and holes that you can you can do in javascript so right so his usability would be pretty awkward too wouldn't it like it need a big manifest or huge command line strings and it's really interesting though um so is the javascript language the the primary difficulty here it's just too dynamic and too unpredictable to scope things correctly to a specific import i yeah i mean i yes yes [Laughter] javascript is very dynamic it's really it's really scary to to say that to for us to make the claim that like okay this module that you've loaded over there you can rely it can never make it make network access and and it's like yeah i mean if you know but the rest of your program can somehow like in in the same address space in the same isolate like right that sounds like that yeah i i would love to be convinced of that this was possible i'm just not i'm not really convinced of that yeah right kind of a similar question about that i know if you have a child process that can easily that doesn't have to obey the the restrictions so you need to have a loud run but is there any possibility of adding um like a sub uh dino that you know you can pass capabilities into from from the parent i don't know if capabilities are accessible programmatically um but are they i mean you can you can start web workers and so you can you can kind of downgrade your capabilities and and start web workers with with fewer permissions than you started with so that that can be a way of sandboxing some code but but that's another isolate link that's running it in a separate thread not not not like in your actual in in your current isolation yeah i know i know type type script has been a little bit controversial in dino um well i don't know if it's actually configurable but you've there's been some churn with regard to typescript i'm wondering if you talk about your opinions on typescript for a sufficiently complex project and are you happy with it would you still use typescript if you were to do this again um i mean i love typescript as a user i think we all do right i mean it definitely allows you to make more complex javascript programs and you know at a certain point i i love javascript too because you could you know i'm i'm somebody who just wants to like mesh my keyboard sometimes and get something done uh but you know projects grow beyond a single file and and get complicated and i think you know the the typescript has has proven really useful at that level uh i mean generally i'd say it's good the the interacting with we're taking two big compilers tsc and v8 and we're like smashing them together and uh in in in like that impact zone that's that's happening there is is a lot of uh uh late nights for myself uh trying to get that all to work smoothly together and tsc is v8 is very fast tsc very slow so you know there's there's kind of um uh technical technical problems like that but yeah just the fact that tsc is is so slow but uh you know i i do kind of think that if i did dino again which i wouldn't honestly but if i if it was like a couple of years ago and i was thinking about starting this project uh i i think there's enough work to be done just on the javascript side and that the typescript side of stuff could have been done on top of this and it might have been a bit cleaner that way um but but yeah anyway i mean i think what's really nice is that all of our like standard library and ecosystem that has kind of come out of this is all in typescript and so we have a really nice code base you know where like everybody's very careful with with their with what they're putting out there and you know everything's linted and stuff and so yeah i i think it's it's been a good thing it's it's just it's a lot of complexity is uh whining back but i had to double check my sources before i commented on this on the question of uh module uh security there's been some discussion about secure a secure module spec which i think in tc39 is under the header of realms and uh the conversation around that has not actually been uh showing positive signs lately um dominic denicola recently was commenting on uh the process boundaries the only place that you can effectively enforce security so that an idea of having any kind of uh security isolation of individual models without leveraging process barriers may not end up being feasible uh in any environment yeah yeah so you know i hear stuff like this and it's like okay you know it's not something that we can like uh you know ship out to people immediately like there's still research going on like who knows maybe it's possible uh but but uh yeah it's that's not really the the level we work at um we're we're you know we're we're just trying to to take v8 and ship something useful and kind of the security the the allow read allow right stuff are just like really easy things we can do to just kind of improve perform cr improve productivity of users it's just like really nice to be able to run a script and just say like hey you only have that work access and and that's it right all right you can't access my environmental variables um uh yeah all right i hope that we have somebody we have somebody in the chat claiming they work on sds so chris kowal if you have more to add to this feel free to dump that in there oh chris hi he's been working on this forever too so i think dominic tar had a question um i don't know if he if he's he has some lag or something um so maybe maybe i'll just uh ask it for him or do you want to go for it dominic yeah there's like i i don't think your mic is on dominic dominic's in hawaii i assume uh on a sailboat in new zealand right okay i'm just gonna ask it i don't know if his audio is working um so he's asking if rust is so fast why write a js runtime with it instead of like a scripting language that compiles to rust uh it's it's we don't have a scripting language that compiles to rust uh we this is this is this is just v8 right so uh i maybe maybe a way to restate that question is like if rust is so great why use javascript at all uh that's that's something i ask myself quite a bit i'll i'll tell you a couple of things one rust is hard so hard like oh my god it's it's very painful and slow to work in rust i love it but very not something that you just hammer out like like you know like oh you know i'm going to take like some aws api i'm going to smash it into mongodb and then do this like no no it doesn't work that way like it you know very very uh precise and sometimes you don't want to be precise when when you when you're programming sometimes you want to be fast and sloppy and i think that dynamic languages are really useful for that there's times when you just do not care what the runtime cost is at all right you don't care uh you know there's a you're running a database like you care like you care a lot but the right what the runtime costs are like you should be you should be spending that time to write that in in rust uh if you're writing a runtime like i am uh you should also be spending some time to figure out how to write that in rest uh but if you're writing uh what was it sherlock the the cli program that like you know prints out people's usernames on on different social networks like nobody cares just do this as fast as possible right um and so this is trying to be a system for doing things when you just want to go fast and i think that there's a lot of use cases where where that's appropriate totally you know um i i apologize if you covered this already but does nino have or it looks like dominic's actually gotta go this time okay um does um does does dino have a native module uh system yes it's called plug-in uh it's not very it's where you know we're we're kind of going more for web api we're trying to be as close to web apis as possible and so we're not we're we're more focused on kind of adding things built into the runtime rather than kind of extending the runtime but yes this is this is a thing and with the the web runtime target i mean you're looking for orthogonality right you want to be able to is that the word that people are using for that you know being able to run a web script in the dino environment without having to modify the code like no browse verify stuff uh i forget the word they use uh isomorphic isomorphic that was the word yeah some mathematical uh yeah that's that's the goal i mean uh yeah we we try as hard as possible even though like a lot of these web apis are totally ridiculous like fetch is not the best uh way to write fetch is not the most ideal api but you know we implement it anyways with just because i mean this nobody cares you just want to use fetch like nobody cares about the the exact you know details of how streams are implemented so so yeah at great at great pain we we implement web apis uh even though we disagree with most of them well you know having done that and then also having worked with node how do you feel about the web apis comparatively uh i mean other than the chore of getting them implemented in the first place i mean i you know some are hit and miss i mean i i think array buffers is pretty pretty good i think fetch is pretty good i think the whole stream thing is was a big failure i mean uh i yeah i i don't know enough about it uh actually but uh it's it's vastly over complicated in in my mind but if dominic dominic was still here he'd probably have something to say about that he's done like what like five different streams implementations at this point right js cannot win when it comes to streams we just we're cursed yeah on the streams yeah i mean i i think in in many ways this is my fault for the initial implementation of tcp sockets and in node which uh had a buffered setup so you can write right right to tcp streams and and it just it it buffers them up in in process and that's not how the syscall interface is right at some point you you you get like oh try again uh you know the the kernel buffer is full we can't we can't write to this again uh so there was that mistake and then the reading mistake was that we had this kind of data event on tcp sockets that just kind of bubbled up whenever you had new data so the by default the tcp stream was just kind of like iterating uh constantly on reading constantly on new data and then you could pause if you didn't want to receive new data this is totally wrong this is this this this was a major mistake in in the design of of node's tcp sockets it's really useful and very obvious uh once once you have uh async await that you should you should ask the operating system when you want to read from a socket right you should say read and then you can await it again all of the all sorts of problems happen from from from this design and and you know even though it was done in tcp sockets i think a lot of other systems in node ended up inheriting this and there's just been kind of a lot of of confusion because of this this design choice that was supposed to make things easier but ultimately like did not do that well in your defense they didn't have a single weight at the time so you know that's true the design was sensible for the time yeah go ahead i i mean not in my defense i would say it would have been better to just stick with the posix cis calls uh and not try to invent that i it was a conscious decision to deviate from from kind of the post-six uh socket semantics and uh because i found them hard i found it really hard to deal with that uh and uh yeah that was that was a mistake well i mean the the the posix calls are hard to use as a user um they're like they require like writing loops to like retry around them and stuff like that um yeah so so are you thinking are you saying that like maybe maybe the alternative would have been to just expose them as is and then just leave it to like user land to figure out a way to wrap it in a nice way and then it wouldn't it wouldn't have like forced the ecosystem to adopt like the design that you came up with yeah so you know like like buffered rights are something that you could have done on top of on top of a a right interface that you know didn't always succeed for example right and this is what this is what we've done in dino not that anybody cares but if you kind of dig down into it you'll you'll you'll find that that our reads and our rights are are optimal and uh post-sexy if that's a word so so we're way over time but this is really fascinating so maybe like let's do another question and then we can go to the happy hour part uh but i don't i do want to sort of keep roughly so for like 20 minutes past time right now so uh does anyone have a final question that they think is good or something from the chat anyone just go for it i i do okay um package management do you see the evolution of that just building on top of um url imports and people will end up versioning that and then that's your package management or do you see a role for a new package manager to step in i i think we're still kind of figuring out so i you know i'm i'm my idea with this urls is that the system can have a very simple idea of of how to like all like it's very easy to explain to other people and and for other systems to kind of interact with this with this url view of the world it's also a web standard this is how websites work um and so i just kind of like stuck hard with this uh without really any thought as to how this this actually might be uh used in in kind of in in kind of like larger projects and you know uh yeah in in the in the way that that people use npm or cargo and i what has developed is is some patterns that i think are are pretty interesting and are pretty usable i'm not sure i think there's still some some kind of open questions but what people tend to do is they have like a depths.ts folder and so they don't people don't general you know say you have 10 files in your project you people generally don't put the absolute urls in every single file they they kind of put those all into a depth file and then re-export the the things that they're using from there and so you you end up having one place where you update and you don't have to update all of your code so you you you kind of have this this uh you know coding it's it's it's essentially a convention for for kind of where you lay out your your your imports uh you know the i think the the problem that is not solved is is kind of the the semantic upgrading and kind of uh diamond dependency problems so so you know i depend on a and b and a depends on uh c version 0.01 and b depends on version sorry this is too complicated b depends on version 0.02 right so you know supposedly cement you know what what you kind of want is is to d dupe that c dependency if you guys get what i mean by this way this diamond dependency where one thing depends on two things which which then you know ideally depend on one thing but you you tend to have this kind of fanning out effect because everybody depends on slightly different versions of other things um this is a hard problem to solve uh i think in some some ways we've solved this by having this standard library and so you know in node you you there's so many little utility functions and so you know you kind of have this fan this really exponential fan out of dependencies and in dino you kind of have a fan out and then it kind of everybody kind of goes back and terminates at the at the standard library because the standard library has no external dependencies you know it implements a bunch of the you know left pad or whatever the hell and so this this all kind of uh is less of a problem in our system but we still have the problem where like oh i depend on the standard library version 69 and you depend on the standard library version 70 and so you you end up having duplicated code in your in your dependency tree uh this could potentially be alleviated at the server level by doing kind of semantic version redirects or something so imagine you linked to version 0.70.0 it could redirect you to version 0.70.1 if if such a thing came out but yeah this is getting a little complicated and uh yeah anyway i i um i'm not sure if if extra tooling is available it seems like people are able to build pretty complex programs with with the system that we currently have and it's pretty simple which is nice and maybe maybe that's actually um you know worthwhile just just to just like ah whatever you know it's modules or urls let's not add anything else and that ends up being being kind of nice but yeah very very potentially like some tooling might might emerge to kind of handle this in in some better way super cool well i wish we keep going but we got to go to the next part of the event uh sorry to everyone for being over time but this has been really amazing um so uh what we're gonna do now is uh we're gonna do social happy hour and the way it works is we're all gonna join a thing called speakeasy.co and we're gonna uh hang out so speakers and attendees are all gonna join and you're gonna be matched to a random attendee for a five minute chat and uh you're gonna be given like a fun sort of conversation starter question that you can use if you don't know what to talk about uh feel free to ignore it if you if you have a question in mind and this is a chance for the attendees to hopefully you get matched with the speaker if you have a burning question this is your chance to ask rod or ryan um whatever is on your mind and we'll probably just hang out there for the next 30 minutes or so and uh and uh uh you know just uh uh hang out and chat so um that's the url there on the screen speakeasy.com.js so we're gonna go over there in another minute um does anyone else have anything final to say thanks ryan that was really cool cool thanks thanks for having me thank you it's fun yeah thanks everybody for coming thanks to the panelists uh for being part of it and we'll see y'all over at uh the speakeasy for the happy hour okay see ya
Info
Channel: Feross
Views: 2,177
Rating: undefined out of 5
Keywords:
Id: r5F6dekUmdE
Channel Id: undefined
Length: 82min 1sec (4921 seconds)
Published: Sat Sep 26 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.