ElixirConf 2021 - Brooklyn Zelenka - The Jump to Hyperspace

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

I saw the slides for this the other day and have been waiting excitedly for the video ever since. In short Brooklyn looks at a bunch of stuff that might impact how we build web applications in the future, with a particular focus on how Elixir can help us.

👍︎︎ 1 👤︎︎ u/mischov 📅︎︎ Oct 22 2021 🗫︎ replies
Captions
[Music] [Music] all right uh hi everyone thanks for staying to the very very end of the in-person portion i hope your brains are all full of stuff from the last two days and it's gonna be my job to take that very full brain and hopefully expand it a little bit and we're gonna be talking about the jump to hyperspace light speed anti-entropy and moving past the cloud so let's jump in we've been building apps essentially the same way for about 30 years and you know so we have this one hammer and we're just hitting everything as if it's a nail and uh my intent the next just under an hour or so i suppose is to kool-aid man through the wall and uh give you a glimpse into other ways of doing things and give you some other tools or at least show you where the other tools are and this is important because we're about to see a fundamental change in how networks work uh and use cases and all the things that go with that so uh it's estimated that by 2025 which is shockingly soon about three quarters of data will be processed outside of the cloud or data centers my name is brooklyn zelenka you can find me anywhere on the internet as expeed i'm the cto at a company called fission where a r d company uh focused on distributed systems and uh we also have uh some products that we then use to actually implement the r d for real and then that forms this nice virtuous feedback loop where we get further ahead try it out see what works and then go back so we are currently doing infrastructure and sdks for edge apps which require no back end whatsoever without limiting the functionality that you have in an app so you can do everything you could do in a phoenix app but directly in the browser with not uh without having to write or maintain a back end everything is local first end to encrypted encrypted at rest distributed password lists and user owned of course this leads us to do a lot of work in standards groups including the decentralized identity foundation where we work on distributed identity distributed auth and encrypted data stores i used to do a lot of work in specs for ethereum um kind of all over the place but especially the virtual machine um stepped away from that for a few years and lately been getting sucked back in because that you know the whole blockchain space is this black hole that pulls everyone in eventually um multi-formats which is a wire protocol and some others and in this community i'm mostly known for writing some libraries including witchcraft lg and a few others which port ideas from haskell o campbell and a few others into elixir and exceptional which is a error handling library that gives you essentially solves the same problem as with syntax but gives you a pipeable interface and gives you more understandable errors so this talk is going to be about the r d that we're doing i was a little worried um before arriving that you know r d can sometimes be researched in particular because sometimes be too far in the future and disconnect from what people are doing today because they won't actually see the the benefits for a few years and then i started attending some of the talks and like oh yeah no this is not going to be a problem whatsoever because people are going right up to where we're picking off so um it's uh um future looking emerging space but i'm keeping it all to the things that either already work people have in production today we're about to go in production or that there's significant investments already happening in both vision and then a bunch of other companies including protocol labs microsoft inc and switch and a whole bunch of others and the rough space we're going to be talking about depending on your exact background can be called edge web3dweb and essentially we're taking a bunch of the concepts that we use in the beam and taking it up one level to work on the internet itself directly so the environment that we're in shapes our solutions and our architectures so how did we get here i would like to say that 1994 set the tone you have to pick some year so 94 seemed like a good one because the w3c was started netscape navigator was launched which was eventually rewritten and turned into firefox uh the uh berger gateway protocol was first used on the internet we all know how that went last week uh and uh we were already doing um things like uh cellular data plugging that in from a phone into a laptop and doing um mobile computing the picture was very specific in 1994. you'd buy a desktop it was expensive it was kind of slow and you'd have some data that you wanted to store off of your system so that it might be accessible to the machine that you have at work because you're going to turn this computer off get in a car drive somewhere and turn on another computer connect with there or maybe you want to share with somebody else that means that it needs to have a process running it so that we can connect to it this is going to be a relatively powerful machine because you're probably renting it from someone maybe it has like a pentium in it or something something like really strong and we call this a server and it is assumed at a distance and this works really well so well in fact that you're not using it to its fullest potential so a bunch of other people show up and start sharing the same machine and that's great but now all of your files are interlinked right they're on the same machine so we have to do some access control so you can't read my email and i can't read yours and that works really well for a long time until that machine gets full and now we have to scale up horizontally we put a load balancer in front of it we do the access control on each of these processes that we're handling in parallel and doing some data sync in the background we want to be able to scale up and down as needed so we add some abstraction we call that the cloud and increasingly this is being called serverless which is ironically more servers not less but you don't have to manage the servers anymore and so it was this way for many years and it worked really well it's continuing to work very well but there's some natural consequences that fall out of this one is that we have an assumed single source of truth we call it the database even if it's replicated it's the database everything is server-centric right there is the server and there's the client and for the most part the two never um overlap we've been building essentially lamp style architectures since the 90s we've changed out my sequel some people still use it but mostly for postgres at this point instead of php we're using elixir but it's essentially the same basic idea right because we want that to continue we need different abstractions so that we can then take it and put it on the cloud and scale it up and scale it down so we want the theme that works on our machine to work on the cloud and be scalable so now there's also devops front end back end right there's this huge towering stack of things to learn how to even train enough engineers to do these things right we can't literally currently cannot train people fast enough to uh to enter the industry and finally we have a hegemony in infrastructure the most recent stack overflow developer survey showed 60 of people had projects on aws and it's really fundamentally these three aws gcp and azure there's you know a long tail of others but for the most part it's those three and as a consequence for most apps unless you're you've already moved to the edge if you're trying to send a message like this from austin to dallas it goes out to the closest data center which i believe is in california and then does this loop back which is fine right sending an email it doesn't have to be like super super snappy but it is in aggregate a lot of time energy carbon and network congestion this is aws's map of where their data centers are the curved ones at least i'll just highlight that so people in the back can see them better and you can really tell where the money is currently right north america and europe for the most part it would really suck to live here here or here but that's fine right because nobody actually lives there except that they do right and they're rapidly coming online they weren't online in the 90s or in the early 2000s but they are now just in this little circle is 50 million people by contrast in north america we have about 50 million people per data center in south america that's 435 million and in africa it's 1.4 billion and everybody forgets about new zealand right but they don't have their own data center so up until very recently they had no choice but to use an underwater cable so we have this new environment what does that look like the internet used to be a convenience you'd have a bookstore and you'd put it online and you could reach more people and make more money now if your bookstore isn't online your default dead we assumed that all the compute was happening in a data center and now we have extremely powerful clients i have an m1 in this macbook air that just screams iot right and other devices so tesla is doing pre-processing of data to train machine learning models in the car before sending it off and people are even putting chips in things like shoes we're about to get smart glasses access used to be at a desk now it follows you around everywhere the bottleneck used to be bandwidth and increasingly is becoming about latency and the market was i'm not even going to say north america really the us and europe and is increasingly global and over the course of the next couple decades hopefully space we're already starting to bump into some of these limits with um the edge of what people are trying to do so i won't go through this entire list but the two i'll call out first one remote surgery they're literally doing surgery at a distance of 3000 kilometers or just under 2000 miles in china over a 5g network so on one side the surgeon on the other side's a robot and the latency is low enough that they can actually perform brain surgery successfully this means that you can serve a lot more people with the same number of surgeons and the other one that might be interesting to the people in this room is location transparency so when you have latency that's low enough and enough bandwidth if you have a heavy process that or or just an overworked machine you can transparently send that to another machine that has capacity stream back the outputs and continue working without the user knowing that it's not running locally andreessen horowitz which is a major vc firm thinks that sensor data in particular is going to kill the cloud because it's not set up to handle the volume or rate that we're about to start producing sensors are going into everything they see us uh coming back to a peer-to-peer model not unlike distributed computing which is how the internet actually started right the internet is decentralized we've gone and centralized it and the good news is that we're really good at that and that we're going to need to move to a data center programming model we're really good at that too so we're already ahead in this community is it a slam dunk no but it's pretty close it's pretty good we already understand how to think about these things in these models and we're already adopting a lot of these techniques so the world is changing han solo famously said that he made the kessel run in under 12 parsecs which is nonsense because that's distance not time and then disney went and made a rather terrible movie to explain this one uh one-off line but actually he kind of has a point because when you're going at the speed of light you can't go any faster so the only way to do faster round trips is to shorten the distance there's this fascinating true story um about these this company a few years ago that wanted to do faster stock transactions and edge out the competition so they could have information sooner and react to it faster and they built essentially a trench from i believe it's new york to chicago narrow straight and laid fiber in it so now you don't have to go around a mountain you tunnel through the mountain and you gain back microseconds literally microseconds we're getting to the point now where that is maybe the first was the first application of this moving into many more latency is a physical limit technically speaking so is bandwidth and storage but we're nowhere near hitting the physical limits of those when you hit when you saturate the amount of information in space it becomes a black hole so we have plenty of room to grow there the speed of light in a vacuum is the speed of causality it actually goes slower in fiber or in other media and at about 100 milliseconds and less you start to want edge applications or edge infrastructure the edge really dominates at 40 milliseconds and less and is uh really shines its best at eight milliseconds and below this is the obligatory ericsson uh slide from uh any any beam conference uh that shows on the y-axis uh end-to-end latency and on the x-axis the failure rate and just to highlight that again for the people in the back you can see that as the failure rate goes lower this is actually getting more reliable as we go on the x-axis we can get a little bit more freedom in time because maybe we have to do more round trips and it's just not practical to do to them in lower time so a couple use cases that they're looking at in ultra reliable low latency computing are factory automation which have to be very very reliable very very fast the smart grid so buying and selling energy to and from consumers so if you have solar panels or a powerwall and you want to sell it back to the grid and autonomous guided vehicles so self-driving cars all need to be under this curve eight milliseconds is a pretty abstract concept right we don't really interact with eight milliseconds very much so because uh we can treat uh time as distance because we're going you know in the best possible case the speed of light um so let's just flatten the earth take away the atmosphere pretend that compute doesn't matter and just go in a perfectly straight line we can get one way from austin to san francisco round trip you'll get to about atlanta and in fiber ideal fiber maybe you'll get to new orleans this means that we're trapped in these causal islands where we can't communicate fast enough with each other can see this as a light cone where when an event happens it can't have influenced something at a distance yet so as time goes on that can ripple out but that takes more time and so we have to set some some limit here on how much time we're willing to allow for right so for a lot of cases having a couple hundred milliseconds of latency is completely fine um but increasingly even talking about things like phoenix live view we want that to be really snappy right we have to keep things very close so there's a bunch of companies working on fixing this there's 5g networks and starlink uh starlink in particular really speeds things up because now instead of having to go through all of the fiber and you know deal with all this geography you make you go through fiber up to the station you hop up into space do a couple jumps around the planet and then back down and you get from one side of the planet to the other much much faster and for both starlink and 5g they're putting points of presence pops directly on the receiving towers so the farthest that you have to go to get cached uh data or doing uh certain kinds of compute is the tower that you'd be communicating with anyway the other one that really blew my mind when i heard about this last year is walmart edge so i'm not sure that they've actually gone and implemented this yes but they made a bunch of uh announcements last year and that is as retail is brick and mortar retailers shrinking they have lots of real estate and floor space so they're filling it with servers and because about 90 of americans live within 10 miles or 16 kilometers of a walmart the furthest you have to do to get compute is 10 miles so there's now a new topology the device in your hand is extremely extremely powerful we then go out to if you do any networking really wireless networking you're going to go out to a tower that tower has some compute and storage on it and potentially a connection up to the satellite network when we need a little bit more we can go out to an edge data center and when we need a lot more we can go out to a database or a database sorry a data center cloud data center and if that cloud data center's on the other side of the planet no problem we can make some hops through low earth orbit local first gets us 99 of applications you can do 99 of things without leaving the device and this to to anyone who's doing things with networking or the web sounds wild but when you look at native apps on your phone that's how a huge number of them already work you go up to the tower you get real-time processing storage caching and transactional processing edge for relay replication consistency and tasks things like cron jobs and then out to the cloud for big big jobs that aren't uh as time sensitive so we have this really nice gradient between literally zero latency because it's happening on device all the way out to pulling from many many you know literally terabytes or petabytes of information and crunching those numbers and everything in between so it's no longer i'm going to send it out to the server it's to any of these and i might not even have control over which one is going to hit we need to be able to compute anywhere which is something functional programming is really good at chris has said to worry less about which database you're using and concern yourself more with how to replicate data in a distributed system in a way that the user doesn't have to care about how it's being done we're given the tools in elixir to do these things so live view at a very very high level looks a little bit like this so there's the entire stack phoenix will take care of part of the client and as we learned this morning increasingly more of the client and all the way down to part of the database and everything in between we have a client device so maybe a computer running firefox and it will connect up to a process that runs on a server and we'll connect to that over web sockets and maybe that's even an inconsistent connection and we'll need to self-heal and that's great we want to communicate with another user so we spent another process they connect over the same way and we have this really nice symmetric picture right it's very easy to think about today what if we took this view and flipped it fundamentally inside out suddenly we don't have a source of truth anymore it's spread across all of these devices but not all of these devices need the entire database they only have some little slice that they that they actually care about we can connect devices directly with each other over things like webrtc or bluetooth and this isn't saying that you can't go out to a server you can but it just becomes yet another peer in this network also now pulling data over http or websockets or bluetooth doesn't matter right we're completely transport agnostic so when you have more distributed systems you deal with a bunch of problems there's a famous list of the fallacies of distributed computing there's like 15 of them we'll just look at very roughly three bucketed we have these causal islands which means we have no single source of truth it doesn't matter if you're trying to replicate there's more events happening already you're always behind if you're on the other side of the planet if you're doing things local first you're running it directly on device how do you do coordination what if you're disconnected and you need to be able to run and replicate on any machine right on the pop on the satellite uplink in a cloud server directly locally your access control lists fail immediately definitely for reads and for most rights so we need to make some trade-offs uh i don't particularly like the cap theorem it's you know when you take availability out your service is just down so pac elk is my acronym of choice and it says when you have a network partition you have to choose between availability and consistency so availability otherwise when it's running normally and everything's connected you have to choose within latency and consistency and there's definitely cases where consistency is important but we're going to be focusing on the latency case we're trying to get latency as close to zero as possible as much of the time as we can so this is our pale system p-a-e-l we have these new assumptions which requires a new approach which means new features when things are local first we can be much more efficient with the network and keep things off of the wire when data can run anywhere we free ourselves up to have commons networks so you no longer have to rent a specific machine or specific service you can literally say hey who wants to run this job for me and maybe it's amazon but maybe it's literally that old computer that you have sitting in the corner collecting dust and it doesn't matter when we free this from location inc and switch has a really nice article if you want to hear more about specifically local first software i highly recommend checking that one out and we have a bunch of new tools this is essentially just a page of jargon but we're going to be talking about a few of them so crdts deferential data log and relative data views the other one that i want to call out really quickly just from conversations i've been having uh the last few days is fully homomorphic encryption fhe where you can take completely encrypted data and compute over it you can even encrypt the function that you're computing it with and get back usable results it's very useful when you want to for example do machine learning on health data without sharing all of your health data so that's all stuff that's coming down the line so let's start with identity and access you'd think that data is at the bottom but actually solving identity and access first is in practical terms where all of our systems are built this is the picture today we have some user they want to access some process and they get stopped part way this bouncer then checks their list and says ah yeah it's the farmer great um yep you're on the list for this specific process and this list contains all of the possible resources in your service and then forwards that on and this is really nice because we can break it down to these three separate stages but it also means that the two that are trying to communicate aren't actually in control right this one's in control and all of the data that you need to decide whether this is loud or not lives there so we have data centralization which doesn't scale up very well definitely doesn't work offline um and if you're trying to do this in a distributed system then you at minimum have to do replication of this list typically you can do that fast enough depends on the use case so instead doing a totally distributed auth we use ocap the object capability model which is very similar to the actor model so here's our actor she's i guess playing a spy and here's a mailbox that we're going to send some messages to and she has a process id and she's able to then send a message and that's all that we need to do we need to know the behavior and the location to send a message to an actor well ocap very similar we have a process we want to talk to and we have some address maybe it's a process id url ib address something and we send our credential to it and this is not like a list where you go to the to the bouncer and they check if you're on the list it's more like having a ticket to go see a movie they don't check your id or know from other stuff on the list the other things you're allowed to do they just literally go oh okay great theater 2. now these two are in control and we don't need an intermediating process all of the required info to complete this request lives in these two we can make replicas of our messages or our credentials and in the actor model if we want somebody else to be able to make a request we hand them the pid and presumably the behavior and they're able to make requests and the same for ocap we can send them a copy of our tickets and this is where the analogy breaks down i guess because photocopying your tickets are probably not going to work but you can copy your ticket you can give them some subset of the rights on it so it says they're only allowed to see the first half of the movie i guess uh and then they're able to make these requests as well the other thing you can do in ocap that you can't with uh or it can't easily with acls is run rights amplification the classic um analogy here is you have a can and a can opener and maybe these come from completely different sources these these aren't don't even come from the same company so one says here's a can here's a can opener and when you bring them together something magical happens you can get into the cab so you can get access to some resource and access to the ability to read it from different places and then read the data it's also more streamlined so this is an oauth flow it is i haven't counted this in a while there's maybe 11 or 13 steps and this is ocap this is actually being uh generous too this could be just one line because all the information you need is uh contained in that one certificate but you know we'll do a round trip to the user and say hey am i allowed to do this and then to the resource and they can respond back it's universal verifiable and user originated and doesn't even have to look that weird here's a version of it called the ucan that is encoded as a jwt we have an id in it this is a decentralized identifier or did did that is provable cryptographically that you have access to some private key so this is the same process assigning your git commits here's everything you're allowed to do so in this case someone has granted me the ability to write overwrite in fact their photos directory and send email from this email address and the way we prove that because it has to be all self-contained is we include another jwt encoded you know because they they always go into base 64. um that shows that the um either somebody who had another one of these and had been delegated to or whoever owns that resource has granted your id the ability to do these things and so you can just follow this chain back and check the signatures and that's it anyone can verify it there is a special case for doing reads especially offline um that you can't ask another process hey am i allowed to see this thing so you have to encrypt the data directly and pass it around this is really great because now we can put this data literally anywhere on all of these machines in an untrusted way the technique that we use for this is called a tree it's actually technically a dag an acyclic graph and it looks like a file system the files get encrypted with their own key each of them gets their own key typically an aes key when we then go up into the directory we hand it the key so each directory has all of the keys for its children then we take that directory and we encrypt it with its own key again different one and hand it to its parent until we get all the way to the top and then we hand that off that top key to a user so now if we give somebody access to the top portion of the tree they can see into that directory into its children and its children and all the way down or if you only want to share a photos directory you only give them the key to that one and they can see below but not siblings or parents um items and then you can give people shared access into a shared space without sharing absolutely everything based on the one or two keys you've handed them so you can now do literally millions of files and millions of rows and databases by sharing a single key so data rob pike the creator of go and he also also work on the plan ix operating system at bell labs says that data is the most important thing much more than algorithms when you have well-designed data the algorithms just show themselves so some of this the next uh i guess three or so slides might be a review for some of you but i want to make sure that we're all on the same page before we start going into even deeper stuff so let's look at gossip and we're going to be taking these three piers and we're going to be mixing paint this is time heading off to the right and off they go and they're making messages at different rates and they're mixing colors and at the end we get to brown maybe not the most exciting color but that's where we ended up when blue heard about yellow it created green and then got kind of confused and sent green back and you all went oh that's great i already have the yellow components of this so blue green great so i'm green now and uh we got some information out of order right we heard about green from yellow after it sent us our original message but everything was actually still fine because yellow is already in the brown also notice that blue and red never directly communicated they only went through yellow to transmit this information this is a very busy picture uh we can break it down uh with math um and not the you know uh scary math from high school uh no the fun math that we get to use in computer science um using abstraction so we're going to eliminate everything about the network everything about time and everything about the number of replicas with three properties we're gonna make things communitive which means it doesn't matter if you do blue or red first we get the same result things need to be associative so we'll have these three and if i bracket them like this or i bracket them like that at this first step they're different but when we get all of the messages together they're the same and finally idempotence which says we can do something once or more times and we get the same results so here's empty we fill it and it's full so we try to fill it again and it stays full no matter how often we try to do that so let's build a data structure this is a very very simple crdt it's called a positive negative counter or pn counter and it's a struct with two sets in it one for ads and one for removes and we're only going to add things to the structure we need to have unique items if we want the numbers to actually change so that's a way of generating random numbers or quick quick and dirty way and to do account we take everything that's in the ads and remove the elements that are also in the removes right so even if there's elements in their moves that aren't in ads we only are removing from this ads group insert as you might imagine we put something into the ads and counter-intuitively for their moves we put something into the removes list so now uh we can insert items so 42 123 999 and then remove 999 we have a count of two or we can repeat one two three a bunch of times we can remove 99999 which we haven't even seen yet insert 42 then insert 99999 which is fine because that will cancel out the remove insert 42 and get the count and they end up the same so it's just a little bit of code and we've handled all of that complexity so sinking related relational data seems hard and there's a few things in this system that don't really work so well in in at distributed scale the first relational database that we still are fundamentally based on the first sql system with system r in the 70s and we've just been slowly incrementing that ever since we tend to have this inserted at column by default and that pushes everything into this total order but that's only the order for this one machine sometimes that's meaningful information sometimes it's not it's hard to distinguish between them also if i'm trying to replicate this data across multiple machines some machines that i don't even trust how am i going to do access control how am i going to hide some of this even if i encrypt rows they know that there's a bunch of rows in this particular table that are missing do i use a log logs work but they can be very very inefficient especially when you have a new a new peer come online and you have to send them the entire log right you have all of this historical data to be very very lengthy so can we do better what we really want is something like graph selection so in this case the blue sorry the blue the green elements uh are things i already have on my device the white ones uh i don't have and i want them and the gray ones are relevant information that's encrypted doing graph selection is good but you do have to follow all of these individual links and so it is also somewhat inefficient so what if we just got rid of the links and organized it as a set so notice that now uh just graphically these aren't connected this is just a bag of attributes it doesn't matter what order we put them in we can organize them any way we want and we can even keep them in multiple different orders every entity so everything that we want to talk about has its own unique id then has a attribute and a value and time when the time starts ends or if it's always true this is just a bag of facts if i say the prime minister of canada is justin trudeau from 2015 onwards it doesn't matter if i say that now or in 50 years or if i somehow had knowledge of the future and said that in 2010 it's always the case we can keep this data in different places some of it like that one row could live in an age data center these rows could be on this person's phone and these could be on a different person's laptop syncing becomes very simple we can say hey give me everything that matches these predicates and it'll just grab them and send them over and you can just place them in immediately you don't have to in many log systems you'll have to see okay well this is where they've diverged i'm going to have to roll back to this point and then play over again you don't have to do any of that anymore and finally universal compute and this is the furthest out piece i'm going to talk about but there's many teams working on this and it's very exciting so we want to scale up our compute we want to do this with parallelization and in an ideal world as we add more processes we just get more powerful we just get faster but that's not the case right we have amidal's law that says even just the overhead of starting up these processes or pulling the end result of the map produced back has some overhead to it so we lose a bit of efficiency as we're breaking this out in the real world the news is actually even worse this is the universal scaling law and it shows that you hit a certain point where you have diminished not just diminishing returns it actually gets worse and this is from incoherence primarily and data data contention so you're waiting for more data to come and it's stuck somewhere or you're getting results that don't make sense because they're missing information in their calculations and we found a way to do this we can do better than the ideal this is done by taking every comp every computation that you have turning them into a pure computation hashing it the the function and the um and the arguments and storing that in a giant universal table in one of those databases that i was just talking about where you can sync different parts of it different people might be interested in different kinds of functions and now we don't have to run the compute ever again this is a technique from virtual machine design being applied at web scale we call it the compute commons one of the names for it at least and here's one design for how this would work pure functions are easy right we put the request on a bus someone picks it up runs the compute and says okay great you tried to add um you know two and two results four easy now anybody else who has those um those arguments can just look it up and not run the compute when you have some pure effect so you have some something ambient but is also controlled by a similar system so maybe it's that database again you can say hey whatever the current selector i have take that data and append it to the name that i'm going to uh create that the key that we're going to store this in and so now we know the state that we started at we've turned this thing that works on an ambient environment and turned it into a pure function and we can then make progress forward because both of these are totally pure we can also undo them if something was done wrong so if you're handing this this compute out to anyone do you trust them well if they do it wrong you can always roll back and finally side effects so we're going to send an email or like a tweet or fire the missiles or something we push this into the stream we say this is what i how i called this and what what time and this is the result that i got back and now we can record that as a fact in our database as well this can look a lot like a gen server here's our handle effect we get this in and we say have i seen this one yet yes great i just stick it directly in i skip it and i just look up the value if we haven't seen it before and i don't have it in my database well it's okay if i also run that so we can run it and then insert it into the database and finally with the external uh functions these side effects we have to actually run it so i'm going to make a post to social media get the result back and then push that as it now a pure element into that same stream so where do we go from here we have to remember that we have this new environment it's not 1994 anymore data is fundamentally relativistic there's no way to fix this we have to work with it so we need to embrace the distributed nature of the internet only replicate what you need to don't put it over the wire unless you have to remember that data propagates relativistically and for yourself from intrinsic time just put the time directly on it and make that part of the data model and the assertion thanks [Applause] yeah yeah any questions or just stunned silence yes yes yes so uh do we have this in production and uh what's the latency like so uh most of what i talked about we have in actual production uh running uh it's still early right again r d so it works most of the time and we're always getting better at that in terms of actual performance and latency the latency between um between replicas really depends on which replica we're on right so that can be anywhere from uh single digit milliseconds if you're in the same room all the way up to you know you've been offline for a week and you reconnect and that's fine in terms of actually doing things like reconciliation on these data structures we've found that it's typically depending on how much you're trying to replicate right it's typically below what a user would notice so um there is with some crdts you come online and you have to stop and sync and that can take quite a while especially if it's immutable which all of our stuff is and because we've changed this from being a log into a set we can just literally dump it in and start computing over it immediately as it's coming uh in is so cool um my question is in regards to memoization have you done any research or have any data on what kind of caching rate it takes to make that worth it because of does memoization make sense uh so this is the simplified view very much uh we do a few extra things we curry all of the functions um and then especially if you're using something like web assembly you can start to do uh hot spot optimization on that function so if you're doing one plus one yeah just run it locally right like it's it's fine and in fact your runtime will probably already have those in a lookup table right um doing this dynamically uh we've been looking at what are the ways to to actually decide to send this off so to compute it locally and store the result or send this off to the network and the best that we're able to do so far i guess there's two ways you can have somebody label it as this might be an expensive computation please send it off if uh if somebody's willing to take it and then we race locally versus externally or you try to do some static analysis or complexity analysis on it based on the input coming in so you can do well i i can talk about that after too but yeah yes yeah e definitely uh and also some work um that we did in ethereum uh as well um yeah e and pony were the big inspirations direct inspirations for ocala i think um yeah so the system itself doesn't uh make a constraint on that uh so there's a couple different ways you could do this the external system could be you know you're uh working locally on your phone and uh you have it connected over wire guard to a desktop at home and maybe that's the thing that's doing the computation and you'll want to keep that for probably quite some time because we are content addressing we're hashing we can see how often this function has been called and do [Music] just cash expiry on it but yeah we think that there will be as this gets um uh picked up more it's more like an availability network of saying whoever's able to return the result fastest will win the contract to to do the compute whether or not they want to pay for it right and then having a large store of these means that you can outperform everybody else ah can you right that's interesting i had not even thought about that so can we then use this table to reverse engineer what the function that we would need to get to the answer is basically like uh almost decompile the answer uh i hadn't thought about that that's a great idea um i can definitely see there being a lot of use cases for that anybody have a last question for brooklyn i think just soaking this in my mind is completely blown i'm gonna watch this ten times and then i'll have a hundred questions for you um can we get a huge round of applause for bookend on this talk yes amazing [Applause]
Info
Channel: ElixirConf
Views: 1,950
Rating: undefined out of 5
Keywords:
Id: ogOEEKWxevo
Channel Id: undefined
Length: 53min 43sec (3223 seconds)
Published: Fri Oct 22 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.