Trent McConaghy - BigchainDB : a Scalable Blockchain Database, in Python

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
thanks everyone it's really an honor a pleasure to be here once again also just in case someone missed before someone is missing an iPhone so if you are missing an iPhone then go talk to the tech guys over there and describe it one hint it's not pink or yellow so you have to do a better job I'm Trent it's really a pleasure to be here today I will be talking but yet yet a different topic that I talked about last year which is yet a different topic than the year before two years ago I was talking about Moore's law and machine learning and and semiconductors last year I was talking about ownership on the Internet this year I'm talking about something you know some people might think is more mundane but it's also super cool and I don't think it's mundane at all and that's databases the common thread from this and last year is blockchains and I'll just get started so this is probably a very elementary picture for most of you you know if you think about what are the elements of computing there's really three things there's processing the storage this communication and if you were to sort of drill into that some more and think about how that translates to modern application stacks you probably say okay well storage breaks into two things you've got the file system and you've got the database the file system is organized by hierarchies of directories and files the database you know the UX is relateable query ability and under the hood to the the hierarchy is really sorry the the file system is really more partition tolerant whereas the database aims for consistency sort of in the cap theorem sense and on the side you know the communications is just kind of there and the protocols that connect and you've applications on top very straightforward no problem no problem and how this manifests itself into modern stacks this is an example of the cloud stack and at the top level you've got applications whether they're web-based or mobile based Facebook Netflix Google Maps that sort of thing wonderful down there on different sort of platforms cloud platforms like AWS or Azure ER and those platforms hold a few elements the processing the file system in the database or databases so in the Amazon stack you've got ec2 for the processing for the file system side you've got things like s3 for buckets for blobs Media and so on maybe a Google Drive and then for databases you've got things like my sequel for C and other sequel cell databases or MongoDB and other know sequel databases so straightforward straightforward this is all dead obvious to you guys I'm sure well you know there's this whole other universe that's been emerging for about five years now starting with Bitcoin and Bitcoin the paper was released in late 2009 the software came out in early 2010 and very quickly I got the label of magic internet money and people had a tough time putting their finger on what exactly it was even to this day is that money is it an asset is it just some you know weird crypto anarchist dream what the heck is it and you know different things have emerged over time you know the there's been a lot of hype and a lot of technology developed and one of the things that's happened is it actually really sparked a revolution and that revolution was people realizing that the ideas of Bitcoin the idea that you could have scarcity of something electronic you could actually apply that not to just the traditional idea of ecash but to other things as well so you could say okay what about other digital assets what about digital art what if I could own digital art the way that I own Bitcoin and this is actually where escribe kind of came from you know my company started about three years ago work on it three years ago or what about diamonds which isn't digital but it's something that you can still track with supply chains right we were just having conversation what about used cars right if you had a much better trail of the ownership of the cars on a database that no one owns or controls that's pretty powerful so a Bitcoin sparked this revolution it morphed into the blockchain Revolution which was I'll get into that more later but a blockchain is really just a type of storage mechanism and more recently we've things seeing things like some more contracts and most recently Dow's especially that Dow and if you guys haven't seen this I encourage you check it out it is rate has raised more than 150 million dollars as a crowdfunding and it's still on barely halfway through its its fund raise and the creators of the Dow are here in Berlin so pretty interesting there's some crazy things happening it's really rewriting the internet for computing I'll get more to that too so basically Bitcoin sparked this revolution but Bitcoin itself and all of the other blockchain technologies basically have had a problem scale Bitcoin has a theoretical max of seven transactions per second and actually when you start to go past about 1.5 transactions for per second the network backs up and it takes about 24 hours for a transaction to go through they call Bitcoin blockchain bloated yet it only holds 50 gigabytes I can hold more in my thumb drive yet people talk about the idea of Bitcoin and these other networks as you know these planetary databases well what do you know of any planetary databases that are smaller than your thumb drive not a very good planetary database if you ask me so that's a challenge right what about planetary scale is there some way to reconcile the dreams of blockchains with scale the technology that we need to achieve these dreams and you might ask well you know is there other places that have planetary scale and you guys a lot of you guys come from the big data world so the obvious answer is yes but Allah start with an example Netflix Netflix uses 37% of the bandwidth of the internet at least in the USA that's pretty good I'd say that's probably the planetary scale usage given that you know the usage of the internet today that's pretty good what's into the core of Netflix it's well big data in particular the Kassandra databases actually instances that they're running there and so Cassandra databases are actually powering Netflix to serve up this media and this is an interesting plot here what it is is the x-axis is the number of nodes the y-axis is the number of writes per second and the first thing you can see is well on the x-axis they're more there's more than one node right there's more than one machine being used to serve up this data it starts with you know 50 on the Left all the way up to 350 on the right and with 50 nodes Netflix Cassandra as the Cassandra instances are running at 200,000 rights per second and as you increase the number of nodes as they as they increased going from 50 to 150 and more it increased the number of Rights per second increased correspondingly in a linear fashion and if you think about it that's actually pretty cool right it means well it's linear scaling and it's actually able to swallow more and more and more data and the core idea there of course is that the data itself no single node is storing all the data it's actually being charted up so each node is short storing a subset and then when the data is getting written each node only has responsibility to keep track of a subset of the data but they have ways of keeping in sync and that's what the next part is about so how does this database keep in sync because if I put some data here and some data there and some data there how do I know which data to look at and what about if one day to you know one node goes down and I need backups and I need backups of backups and so on and that's actually the idea of replication of course right so within these distribute databases there's this idea of consensus which is algorithms or protocols to main consistency among the databases at the log level and these algorithms they go back to actually do this 60s when the ARPANET was being worked on and so on and a lot of theory was developed in the 80s and 90s most notably probably the first major result was in 1982 with Leslie Lamport and his colleagues at Microsoft where they came up with a concept of Byzantine fault tolerant behavior and a theoretical solution to it and Byzantine mate basically means not only keeping things consistent but also handling when you have malicious actors people that are agents that are trying to break the system and so that came along and you know it was kind of ignored ignored ignored this was still the early 80s there wasn't a lot of big networks and then in 1990 Lamport and his colleagues came up with this idea which they called Paxos submitted for publication it got rejected so they just let it gather dust it finally got published eight years later when the world really kind of realized hey maybe we should you know publish this again and it was accepted it was actually a really horribly written paper but since then this this paper Paxos has become very very famous a is really hard to understand which isn't quite true but also as an important result for also consistency among databases and so what Paxos is is basically a protocol that's provably correct to maintain consistency among the different nodes in a database and you can implement it and that's where a lot of the implementations of early-2000s came from things like apache zookeeper etc were originally based on paxos turns out that to implement them there was some deviations from taxes but the general ideas held true so there's this idea of consensus it's been around a long time and it's what powers things like Cassandra so going back to the blockchain world right so blockchains our revolution though they can revolutionize stuff but scale is really really an issue so people have been talking about okay well let's just scale up the blockchain let's just scale up the blockchain well a lot of you guys have done working machine learning have you ever tried taking some sort of algorithm your toy algorithm that you played with say some genetic programming and said okay great I'm working on ten variables now all I needed to do is scale at my genetic programming and make it work on ten thousand guess what doesn't fly right it never ever flies basically as you have probably seen with the research from Norvig and others as you actually go to higher and higher scales the algorithms change and they tend to get much simpler and much simpler and much simpler right you have to keep the same inputs you have to keep the same outputs you want to maintain the same behavior but you have to let go of your pet algorithm right it's really about the result so trying to take an existing blockchain idea and scaling it up well maybe you look at 10x if you're lucky or you know 50 X if you're really good and that's actually what kind of everyone has been doing you know major debates in the Year 2015 over this you know lots of Tears but that's every what everyone is doing we we a started asking the question what if you instead of Big Data five block chains what if you block unify Big Data so what if you start with something that naturally scales and give it characteristics that block chains have right so start with that and there's way more history of research in Big Data thanks to the database research going back decades but then the one question is how do you block TANF I this what does that mean right so a lot of people will have big debates about what blockchain means and fortunately we've been doing blockchain stuff since before it was cool it's very uncool for a long time and I don't care but cool or not I just care about what's you know interesting to me so I'm going to give you some definitions leading towards app working definition for blockchain if I and blockchain so first of all decentralization decentralization simply means no single entity owns the controls and this is really useful to disambiguate from the word distributed so distributed means the resources are shared among more than one machine whether it's processing resources storage resources and so on so there's still you know a lot of confusion in the community if people use the word distributed ledger for example to mean actually decentralize later so there's a lot of miscommunication but this is actually a good practical definition so you can have centralized distributed databases and that's actually what we have with you know the Cassandra implementations that netflix has etc but we can also have decentralized databases where no single entity owns or controls those databases immutability this is the idea that it's more tamper resistant than usual nothing is perfectly immutable right we already have logging databases we already have automatic storage to magnetic tape drives etc and so there's already degrees of immutability that we have but there's things that we can do to make it even harder to remove what was written before right so we can do things like hashes of hashes of hashes sort of like a snail shell getting larger and larger and larger and this is actually one of the newer ideas that blockchains bring although actually even that research goes back to the early 90s literature and time stamping so Satoshi didn't invent very much a lot of people would claim that he invented everything and he does a lot of other magical things but there's really not a lot of magic it's actually just good engineering with decades of history so that's immutability there's another really cool idea and that is assets and so the imagine if you have some sort of storage medium where assets can truly live on that storage medium now for that to be the case no single entity can own or control that medium because otherwise they would be owning the asset right so as a prerequisite you really need the decentralization and also the ability that those assets couldn't be arbitrarily deleted you need the immutability but once you have that then you can have assets that live on there you can issue assets you can transfer assets and so on and you can define an asset as I own the asset if I have the private key to that asset which is essentially the password so I entered the password I can transfer to my buddy then he owns the asset he can transfer and so on and so on and so on so that's assets and these three characteristics together decentralization and we do billion in assets this is really what people talk about as the major benefits they get when they have a blockchain blockchain as a storage medium as a noun there is a ver more narrow definition of blockchain that goes back to the early 90s and that's literally this chain of hashes the time stamping literature etc but that's really pretty narrow and no one really liked even though you could argue that that's the real definition no one really bothers because there's so much more value so blockchain the known is you know some sort of storage medium that has decentralization immutability and assets then the adjective is simply the collection of these three sub adjectives so blockchain the adjective means decentralized immutable and assets that's it so that's the working definition of blockchain then what we can do is we can say let's give these attributes to an existing Big Data database to an existing distributed data database so this is how you do it this is how you block unify Big Data first of all the key key key thing this is about scale and speed so you have to say let's retain the the performance of this Big Data database so you've got some sort of Paxos derivative whether it's raft or something else that's solving order you get out of the way of that you let it keep solving order you don't need to try to sort of order on top and what that is doing is it's actually not naturally building this log of all the transactions and once you have that log all the other information can be kind of built on top in this hierarchical fashion so you let the database keep doing that right it's all about order and then once you make sure so that's sort of a constraint as you're doing this and then you add in these characteristics decentralization so obviously with the centralized databases sorry with with distributed databases you've got you know ten nodes 50 notes whatever what do you say okay each of those nodes you have a piece of control and collective you all of you guys have all the control together right but that's it so and imagine if you say each node gets one vote simple enough right and it's actually as simple as that in a sort of theoretical fashion you call this a Federation and then you say okay for any transaction to go through all the know a majority of nodes have to say yes or for a and Jack for a transaction to get rejected and what you already have to say no and they just say no to the transaction and that's it now if you want you can actually say okay it turns out you can optimize this and you can say instead of doing one transaction at a time let's vote on a thousand at a time and that's actually faster and you you group these thousand transactions into a block and that's where the idea of a blockchain comes in because you also have the hashes in the blocks so you have decentralization the Federation of nodes you have immutability via simply hashes on blocks and a few other related ideas and then for assets it's basically saying making it really easy to register an asset making it really easy to transfer an asset and and then you have digital signatures on this so the digital signature is basically only myself only I can sign this and then I'm done so really like at the core core core level it's really these three things kind of simple conceptually and of course though in the real world it takes effort to get there and I'll just describe a bit more of the architecture so overall in the system this is what big chain DB looks like we actually built on top of an existing distributed database we actually looked at quite a few benchmarked quite a few including MongoDB Cassandra elasticsearch and so on and you know the first thing we noticed was there all wildly better scaling than any blockchain out there and we knew that not surprising but then we actually filtered down to one of the main characteristics and that was how good is the change feed mechanism how so a change feed as in instead instead of having to pull on the nose to see what happened what happened would happen how good they are telling you about what's happening in real time and rethink TB is really designed for that from the from the bottom up so so we we chose rethink DB it's a JSON style document store built in C++ and in the architecture here in the very very center you see there's three rethink DB nodes that's the arse probably obvious and then wrapping each of those nodes is a big chain DB server node and that's what we see here so there's a big chain DB server node for each of those and then there's the client side whether it's Alice or Bob or whatever and Alice and Bob they're running some client-side software to that's quite straightforward so that's roughly speaking the the the architecture one constraint we gave herself and this was related to letting the the core database in our case rethink TB the constraint we said was don't add any new communication mechanisms beyond what we think TB had because it already has its way of communicating so if I want to node one to talk to node 2 then it has to be actually via storage inside rethink TB itself and what that looked like was the following two tables so one table for incoming transactions and another table for the actual set of information that's being stored permanently this sort of blockchain the set of transactions about who owns what etc etc etc so the incoming contract transactions coming in the left and they're immediately deterministically assigned to a given node to handle and the grouped into blocks as they come along and once there's enough transactions for a given block for a given node he writes that block to the to the table on the right this is all within one reading to be instance and other nodes are also trying to write to their and here's a key thing when you write to the table on the right so WRI te writing to the table on the are you guys know what I mean anyway when you write to it you let the the raft algorithm the consensus algorithm everything to be sort out the order we don't impose an order we let the underlying consensus algorithm solvent so basically things get written on the right they're grouped together and then once they're written we actually vote after the fact because we found that writing was a bottleneck so we write right right right right and then after the fact right okay this whole group of transactions here are they good yes no yes and each node votes and then once the majority has said yes or a majority I said no you know whether or not that block is good or not so it's as simple as that it feels very very simple and that's because as you guys have probably seen in your own experiences when you do stuff at scale it's really the simplest algorithms that emerge and that's really what you have to kind of aim for and can you know you have to be very you have to constrain yourself to just the simple in the elegant now obviously there's a lot of ways to do things that are simple still right it's still a big design space for simple but this is what I'm rich here so we were benchmarking and the very first thing we did well as we iterated on this we said okay if we design the algorithm just right then we can actually get away with benchmarking just rethink TB as a first cut because everything else we're kind of getting out of the way so we designed the algorithm where the the rest of the algorithm got out of the way and we could focus on just seeing how far can we think TB scale so in this set of benchmarks here what we did was we started with just one node and then we added a second a third all the way up to 32 nodes and and then we saw how the throughput the number of writes per second increased and you can see as it goes from 1 to 32 nodes the writes per second goes past 1 1 million basically and this is actually blocks of 1,000 transactions were and then it so it's 1,000 writes per second times 1,000 transactions per second basically speaking so what we could and then this plot can translate into just a simpler plot which is very much like the Cassandra platter before so the x axis is number of nodes the y axis is number of writes per second and we can see that we get one more than 1 million writes per second this way and this is on rethink DB knowing that one if we have rethink DB going really really fast than everything else all the algorithms around it we've designed them where they work too and we'll be releasing benchmarks in that shortly as well so basically interestingly enough too as we were working on this we found probably four or five six bugs and rethink DB and we worked with the team everything DB to help resolve those bugs and often they made a difference of like 2x or 4x to the speed so this actually improved the speed of the rethink TB database we wrote a paper about this in great detail some people call it a book because it's about 70 pages but it's really an easy read and you know and by the way I'm getting again I can I will get into all the Python stuff in a bit here but to kind of summarize a bit here on sort of what what is it that we built right one way of thinking about it is trying to bring together the best of two worlds the world of big data distributed databases and the world of blockchains so what blockchains bring to the table is this concept of immutability this concept of decentralized control and this concept of assets and then from the big data world it's things like it's basically scalability right so throughput low latency high capacity permissioning system querying all this sort of thing and so you know if you're coming from the big data world and you ask yourself well why would ever care about using a blockchain or a block chain database it basically comes down to is their value and immutability or is their value in decentralized control or is their value in assets so for example if you are a bunch of if you're the music labels right and you're not very happy with how Apple is behaving you know they're taking the 30% cut and yet you don't have your own way of so working together with the other music industry labels what if you actually had a way to join forces have a database that you don't control but none of your competitors can control other but collectively you own and control that right they tried doing this a few years ago they failed because they couldn't agree on a technology because it came down to one of them controlling it block chains actually allow them to collectively have a technology that they only control collectively so that's one example in general this idea of consortium databases is actually a pretty powerful concept and also we kind of view it as you know back in the day there was sequel databases relational databases then Along Came the no sequel databases which offered scale which offered ideas of aggregation and so on or the graph capabilities and this is sort of a new type of database that's really emerging this new blue ocean database and it's really you know a blockchain database that's immutability centralized assets to give you a feel of how people use this ever ledger is one of our users customers and they're doing this for diamonds so basically if you go and buy a diamond from some dude in the street how do you know it's not a Blood Diamond how do you know it didn't come from child-labor right how do you know it isn't stolen what if there was a registry for that diamond that was being managed by all the major certification houses of the world and the insurance companies and the mines etc so this is exactly whatever Ledger was doing and so it's the diamond supply chain the diamond industry is about an 80 billion dollar industry it's been estimated to have 40% fraud and it's been totally opaque so that no one was detecting the fraud we actually worked with every ledger we swallowed all their diamonds that they had so far as well as some eBay data and so we had almost a million diamonds and we stole it a month's worth of eBay USA data and we actually did machine learning analytics reconciling the data from the certification houses five certification houses which would never talk to each other before we reconciled that with the eBay USA Today and found 7% fraud rate that works out to 750 million dollars a year worth of diamonds this is an example of the value of blockchain so basically 750 million dollars the fraud that was going to criminals now it's being prevented thanks to ever ledger and between dB another example is energy deregulation so RWE biggest energy provider in Germany they're being asked and you know the laws in Germany and Europe are getting deregulated right so if you have energy provider a some guy with a solar pet farm another person with a wind farm how do they sell that to the grid who's managing do you want some centralized controller wouldn't it be cool if there was no centralized controller and instead every single node could talk to every other node and there are some way where markets are more jet setter so this is actually the the promise of blockchain technology and there's a whole bunch of initiatives around this working on it so we're working with our w/e as well in general to help manage the dollar flow in energy regulation another example is in the world of health so if I'm a doctor in London and bayer comes to me or some other big pharma and says hey what can I'm gonna fly you to to Hawaii for two weeks for this awesome conference all you need to do is buy my drugs for the next two years or subscribe to my medical journals for the next two years and the doctors like well yeah okay and how do you know when it's legitimate versus not right so it turns out that what the UK government did was they actually created a law called the Sunshine Act and every time dollars flow from hospitals in the UK to any of the big pharma or any of the big scientific publishers like Spring etc that has to be transparent perfect use case for block chains block chains are transparency engines no one owns or controls it so we're working with tension 90 which is a bunch of experts from the field to basically shine a light on the flow of dollars among the the UK medical professionals the big 20 pharma and the publishing interesting one final example this is overall the company I work for is called a scribe and we started out I talked with this last year with IP in the blockchain we still do this right we had to talk earlier today from from Ryan who was actually talking about 3d IP and it's it's very related to this overall challenge so if I'm an artist I create some digital art how do i how do I prove that I'm the owner when I sell it to someone else how do they know that it's me and blockchain technology can help here so a scribe by oh we're moving very quickly towards towards using big chain DB as well for that and overall you know the art industry itself digital plus physical is a 64 billion dollar industry obviously the copyright industry itself overall is you know hundreds of billions so Python so if you want to try a big chain DB it's actually not much different than using other databases we've gone out of our way it's not like trying to use event if any of you have ever tried to use Bitcoin daemon or if one of those it's like this nightmare it's just horrible horrible so we've got another way to make this really simple so off of big team DB calm if you scroll down from the top we have basically some links to just get a very quick start so the very first thing is obviously we've open sourced it it's a GPL it's on github and you know the link to github you'll see everything there all of our tickets all that sort of thing we've also got another way to document this really well so we're using read the docs and I'll give an example I'll drill in on that on a second basically to get going and playing with big chain DB and so installing it we're also within the pi PI repository so you can just do this in same simple pip install big chain DB and you're good to go so thanks Sylvain we have a few of our local Python experts in ascribe so this helped Kip keep things simple that way so I'm just going to give an example of the usage of this big chain DB itself is all written in Python rethink DB is in C++ so we've got you know C++ at the very core and then Python and I'm sure a lot of you guys have this experience you shouldn't be trying to write things prematurely optimize in some harder slower language like C or something write it in Python and then any to anywhere you have bottlenecks that's where you start to optimize will the algorithms first and then if you really need you can optimize in some other language so this is our path as well so basically in this case once you have between DB installed you also have to install rethink DB which is also simple then you get everything to be going you run the configuration of big chain DB you get it going to start so if the big chain DB daemon is running and each one of the server nodes would run this we have actually scripts to deploy clusters AWS and so on and then in a new terminal start up new terminal and get python going there and I'm just going to give an example of running a client here so you first of all import big chain and then you say B equals big chain which is now you've got this object of the class big chain and now I'm just going to show how you can create a digital asset and transfer it so it's just dead simple right you first of all you import this crypto library from the chain DB and then you generated a key pair which is a public key and a private key and this is just an example of how you get a key pair of course in a real world scenario you don't have to do that you would have it from something else you have some sort of digital asset payload so this is the you can have a message or you can have something very detailed if you want it's all adjacent so it's straightforward or Python Dix which translates to jason and then you create a transaction which is basically saying from who - who in this case you're saying I'm creating so I'm doing a register and then it's passing in the payload and then in a crucial thing here you have to sign the transaction and this is basically saying you sign it with your private key and this is basically proving that it was me who just doing the transaction so when I write to the to the big chain DB the world can see that it was me because they can verify against my public key and so after the signing then the step is just simply to write it and writing it will actually write it to the big chain DB server-side instances and of course you can retrieve that just like any database so you know what this is it looked sacks feels like a database right so this is probably super dull to you guys like duh of course this is a database but that's the thing this is also a blockchain it's a blockchain database right so the whole point is that block chains can look can act can feel like databases so here you you say what does the transaction look like you retrieve it and it's got a few fields basically an ID and then the transaction information and the transaction information has three main sub components the conditions which is sort of like the inputs the data which is the payload and you can have a lot of things like metadata like artist name etc in there and then fulfillments which is sort of like the outputs and I won't get into great details here but that's roughly speaking how it is once you've got at the asset that digital asset created you can transfer it so here at the very top I'm generating a key pair for some user number two then I'm creating a transfer transaction and that's basically between user 1 and user 2 and then of course I have to sign it and if I'm not the owner of this then the signing isn't going to be very helpful of course so I have the the previous creation obviously I signed that one so I sign it and then the final thing is I write the transaction and that goes off to the database now if I neglect you to sign this or if I signed it with the wrong private key then when it goes to the database it would get checked by the database nodes they vote and they say ah no good right and then we get kicked out and actually that's when I just described the not signing is one of a broader set of conditions what if I created that asset and I sent it to Sylvain and then also I sent it you know two minutes later to Ryan right well Sylvain will get it but it will complain when I try to send it to Ryan it'll say hey you've got a double spend and that's what I'm showing here as an example basically I create the transaction I sent it to well I sent it to a new person and it of course complains and it gives me an error message so in starting to wrap up here this is the second left side on the Left I talked about the centralized cloud stack that we have today right in the case of Amazon we've got ec2 for processing we got a file system or file systems like s3 and all this and then a whole variety of databases whether they're sequel or no sequel or whatever and then the applications running on top on the web or on mobile now the world is shifting this is centralized stuff is going to be around for a while probably quite a long time but we're moving towards something that's much more decentralized as well so it's sort of this vision of internet 3.0 so you can take baby steps which is pretty cool so a lot of the value is and things like registries where you retain your full stack that's fully centralized except for the one thing you add one new database the new database is big chain DB so you maybe you keep around your my sequel database in your MongoDB database but you also are using this other database this registry that's out there that you don't own a control that no one owns or controls but that a set of people to collectively own and control and that's what the middle is so it's it's partly decentralized and that's actually the stack that escribe has so you know we've got centralized servers running Jango on Heroku and we've got you know we're using s3 so the application itself the ascribe web app is partly decentralized but what we're moving towards is this world where there are our applications out there that are fully decentralized decentralized apps or adapts for short or even apps that go beyond the idea of sort of an infinite application they're more like an organization and that's the idea of decentralized autonomous organizations or Dowd's right so that's the thing on the far right on the top and there's just an explosion of these and what's powering them is is three things aetherium which you can view as you know scripts that are running decentralized processing that are running that no one owns or controls and there's a whole stack an ecosystem around aetherium platforms etc and then under the hood you know aetherium hasn't traditionally had much for storage media so that's where there's a decentralized file system that's been emerging that's ipfs interplanetary file system super cool I encourage you guys to check it out too and the database which is actually what we've been building what I presented today to you big chain DB this public versions of all of these to private and public so with big chain DB will be actually more fully announcing a public version of big change EB in a couple months sorry in a couple weeks but also in general you can roll this for private networks or public networks so we're moving towards a much more decentralized world sort of an internet 3.0 and so I'll wrap up basically big chain DB it's a blockchain database it's scalable and of course it's in Python so if you want more information obviously feel free come talk to me go to big chain DB comm to read the docs to get started check out github and we have a mascot his name is Wrigley he's a moose he's from Saskatchewan Canada like me and that's all I'll wrap up there thank you very much Thank You trench thank you for his stunning presentation questions I would expect tons of questions we have only three minutes hello thanks for the presentation it's an amazing project something that I seem blockchain and I don't understand how it works here is that you mentioned that validation is through some kind of voting system and I understand that the nodes are the entities that vote so what stops me from spinning up a lot of nodes to get a lot of votes for example or abusing the system that way yeah so that actually by the way is called a Sybil attack and that's a problem when you have a fully open network when you have unrestricted membership so how Bitcoin solves that is saying you only your you vote is based on how much electricity you spend how we do it here actually is with a Federation where it's a list of public Eve's so basically if I'm creating a network for the you know the medical community with I'll just have the list of the 20 hospitals that I want running nodes and that's it right and then you can add nodes and remove nodes in the fly and basically the adding and removing is the existing nodes they vote on that when we roll out the public version of big chain DB it will also be based on this concept so we've been actually working on gathering together a lot of organizations that really care about the future of the Internet and the past of the Internet and collectively they will form a bunch of nodes and then to add more nodes they collectively vote to remove nodes they vote so it's actually a pretty good workaround you don't need to get fancy with crypto economics with crypto coins or anything like this it's just a much more pragmatic solution and it also happens to scale a lot better but at some point there's this centralized organization perhaps democratic that decides who enters and who doesn't no you don't need it right you you have these 19 nodes and basically one entity can propose hey I'm going I'm going to propose to add this node here and then all the other nodes they vote right people can sit down and decide whether they come in or not right so you don't need an entity actually a centralized entity it's just like the dowel right now right the Dow has it's this code that's out there running and you're going to get a vote proportional to how much money you put in right so if you put in 10 million dollars you get less than 10% of a vote it's that nice anyway see my question regarding here is I heard about Bitcoin blockchain but there will be a validator they call it as - correct they will validate each and every transactions and they will push the block to the blockchain VP that's correct yeah that's how Bitcoin works so basically there's these nodes that they validate that the blocks are okay and then in the act of like you you spend electricity in order to get the chance to be elected you know randomly based on how much electricity you spend and if you get elected if you get chosen in that round you win the lottery then you happen to validate that ten minutes worth and and then that's it that's how that block gets validated how we work is we say ignore all that extra complexity both spending electricity etc and instead the validation is done by these existing nodes so there's no mining per se there's no cryptocurrencies being created and through that the reward is extrinsic basically the the participants in the system that are doing the validation they have a collective interest in seeing this network run and survive so I have a related question on Undine incentives if it works but it doesn't work perfectly and it would benefit from an upgrade how do you have a tragedy of the Commons situation or something like this you have like two trade offs - 2 volt who's gonna spend money to run more notes or stuff like that yeah so in general with this sort of setup you're probably gonna have you know 15 25 35 knows you're not gonna have 5000 right and usually it's going to be a bunch of people that you probably know like they're all in your trade organization or well you know they're all the people in some supply chain ecosystem etc and they're going they're all going to be incentivized to have the most recent version of the software right if they don't if it happens to be a speed optimization maybe they won't bother upgrading right in the world the block chains this is called a soft 4 versus a hard fork so a soft fork means you you upgrade but it doesn't affect the protocol at all a hard fork means everyone really has to upgrade or and not be going on in the world of databases you know you think about you know you've running a database live and then you have data live on that you do a migration of the schema right just schema migrate so this is actually much healthier way of thinking about it rather than hard Forks and soft Forks it's more like okay having scheduled maintenance upgrades and so on where everyone agrees and if they don't agree maybe this it will get stuck in some old version and that's how it how it works but overall governance the general approach to government is basically this majority idea right whether it's adding or removing members whether it's upgrading or not etc right so if you consistently don't upgrade then you can get removed as a node and another node will come in that actually does actually properly upgrade yeah great talk train yeah so this leads me to my question which is majority votes have you thought about something else to decide rather than a majority there's all kinds of ways they can reach consensus I guess yeah I mean there's a lot of ideas out there that's actually a pretty broad design space that you can have but majority is a pretty good starting point right there's a lot of other things even in the world of traditional blockchains right with Bitcoin there's this idea of the longest chain rule so whichever chain of blocks is the longest that's what the ground truth is with aetherium there's something called the ghost Protocol you can have 2/3 votes you get a majority vote so you can have things weighted the majority is actually pretty good it also ties in nicely with theory from Byzantine fault tolerance of the to F plus one 300 plus one so there's there's other reasons for kind of sticking with something simple like this for a sort of theoretical reasons I would like to stop here thank you very much one more time
Info
Channel: PyData
Views: 14,543
Rating: 4.889401 out of 5
Keywords:
Id: 1NHHmHVCWy0
Channel Id: undefined
Length: 43min 19sec (2599 seconds)
Published: Tue May 31 2016
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.