Rich Hickey - The Database as a Value

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi I want to talk about the database as a value so when I'm going to talk about is something that I've implemented in day Tomic and how many people have heard of day Tomic have you not heard they tell me okay I'm not going to describe the atomic and foal I don't have time it's several hours worth of talks around it in brief it's a database it focuses on taking the database apart and giving you those parts as independent pieces in particular so that you can get more power in your applications by having the database as a value which I will talk about today it reads what should the data model be what should an information model consists of and decides that it must incorporate time and I'll talk more about that and we can do this now because we have a lot more compute power and resources so that's that's why it's a good time to revisit these things it's not new in terms of being novel necessarily because all these ideas have existed for a very long time so you know why why do why are we doing any of this stuff why do we do functional programming right how many people here program in a functional language or in a functional manner in a non functional language okay and what's the breakdown the room closure people Scala people are lang people other kinds Haskell people all right who did I miss f-sharp people who already ruled out there was this great paper how the tar pit how many people have read it alright don't read it now but a lot of people have not read it go grab it after you know tomorrow and read it it's a great a great paper it's really a thought piece what they what they were what they were asserting was our programs are too complex for us to get correct and one of the reasons why is because of uncontrolled state I don't disagree with Eric right there is mutation we need to deal with it and and dealing with it was what's important not pretending it doesn't exist and they proposed a suite of techniques that we could adopt in our programming that would make that would reduce the complexity in our programs one was functional programming another was more declarative programming because even functional programming still has a lot of order in it right a lot of these concatenated you know functional chains there's order inside that stuff and that order makes it hard to rearrange things you know some sufficiently smart compiler you know supposed to do that but we have yet to you know find one so the interesting points about state and control but but finally the biggest thing was was the state part and they said you know we don't we don't want to have a whole ton of mutable variables but we'd rather have is a sound model that is an information model that we could use and then every now and then it would change and they chose in this paper the relational model and relational algebra so what they imagined was a system where you would program with functional programming declarative programming in the form of relational algebra against a data model which was the relational model for data but there was this there was this missing piece which was there was a presumption that this relational model that you were programming against somehow would become different every now and then and it was like you've seen that the comic strip in the comic right where there's this guy in a blackboard when he's got a this giant equation right in the bottom it says and then a miracle occurs right and then you have a new value of the database and this was not implemented it was a great it was a great piece that came out after I had started closure and when I was trying to decide whether or not to stick with the functional approach and was very inspiring for me to continue to to do that and and in day Tomic what I've tried to do is tackle that last bit which is what is the process that creates a new value of a database and how can we actually provide that value well look the slide comes in pieces so how many people are in IT basically yeah all of us was I stand for information what is information me this is what it means right these just go the dictionary has the meanings of stuff to convey knowledge via fax right to give shape to the mind and what is information it's just the facts right how many people remember the monkey before was ripped apart I hope everybody or there's been a lot of drinking happening right how many people could tell a story about what happened to the monkey good right now you can do those two things because your brain is not pretending to be the monkey right it's not the job of your brain to pretend to be the monkey it's not the job of your program to pretend to be the monkey it's the job of your program to maintain information about what happened to the monkey right that's what it is and in particular we use databases to store information right so we have a whole ton of complexity that comes with databases right that this tarpit paper says well the database magically changes and then you have a value like this big relational model that's immutable and you can do all the algebra gets it like you can do algebra against you know numbers but it has all this problem right it's essentially this place you go to and you ask it for something you get something out of it and you go again you get something different there's no basis for your calculations and again it's the same kind of thing if you wanted to tell a story about the monkey and the part of your brain that remembered the monkey was just like being trashed by thoughts of you know I don't know what some computer game you couldn't you couldn't do that you couldn't reason about the monkey right you couldn't tell you couldn't do your you know talk about the monkey if you couldn't go back to some basis that was shared remember the monkey that was all intact okay well then this happened to it what if I had done something different to that right if we don't share some foundation we can't carry out a logical argument and then this and this isn't about logic or something fancy right if you have a business you can't make decisions if you don't know what happened before yesterday you can but we have this problem we can't go back we can't get any business we actually can't we're not correctly implementing memory that's our problem right and we're not going to get there by pretending to be the monkey right that's not memory other problems with the database is its remote so we have this communications problem that leads to a bunch of a bunch of issues and the notion of what is an update is poorly defined so what do I mean when I say basis well I sort of talked about this right it's the it's it's how we do calculations right you can say there's no such things as values and value you know anything could change at any time but you really could not define any mathematical algorithm or anything else or any decision-making process if in the middle of that process the things you said we're the foundations of a prior decision have changed in the middle of 42 could change into 57 in the middle of of calculation if registers in the computer just had you know quantum effects randomly changing them you really couldn't get any work done so we need this basis in particular we're going to have multi-step operations so we may want to revisit the basis more than one time and have that revisiting mean the same thing anytime we have the notion of simultaneous change this doesn't work it's not to say that there isn't change in the universe right but when we're trying to think we actually need something stable when we're trying to make a decision we need something stable to use and that's I'm going to call the basis right and we have this problem also with databases update what does it be what does it mean right does an entire row replace another road is it a column thing is the whole document replaced the whole document can you replace an act you know one part of a document if more than people more than one person is doing this at the same time what's happening all right what are the semantics of update it's very tricky right now traditional databases have gone to great lengths to try to define you know as if sequential operations and things like that so you can start to reason about it we're starting to discard all that as we adopt no sequel we so whatever you know more than one thing is going to happen and its eventual consistency like just by saying that we're good it's going to be fine eventually it's all going to be good right that's a just a lie right it's a lie and then what can we see why this well this is happening I'll use the word perception a lot what can we observe because if if we couldn't actually see light rays right bounce off the monkey if as they traveled back to us they got all scrambled but if we couldn't observe a consistent view of light coming off the monkey we couldn't even have experienced his whole little presentation right what do we see when we see things one we see the past right we know we don't have a live connection to the monkey our eyeballs are not on the monkey right we see the past what else what else is characteristic part of our neural systems are they continuous no right they save up stuff you know this is light waves or like whoo maybe their quantum right we don't know but something wave-like is happening all right it's hitting our eyes the retinas going wah and the nerves are doing what they go like this boo boo no they're going do they're taking pictures right big simultaneous snapshots right and that's what we're storing in our memory we need that that's how we work we're trying to build systems if you want to emulate anything don't emulate monkeys getting torn apart and relate brains those brains are more powerful than stuffed monkeys so what happens we get wrong programs we have problems trying to scale things we're afraid of all the round trips to the database ooh I better go once and make sure I do everything I might might I might need to do we're afraid of overlaying the server right because we co-locate all this stuff right we have this big database server brain that does all this but it really leads to programs that are poorly modularized right right you're gonna have one party program that decides we should talk about this person you know and another part of the program that decides well it's going to go on a screen and the screen should show the you know the email and or whatever who writes those two parts of the program different people I hope maybe if you have a big enough company right who writes the query that gets this data different people know right you're trying to shovel this one curry through and this these two conversations through one query because you're afraid if you issue two queries what are you gonna get things that don't match up because you have no basis so we have poor modularity so we need to pull this apart we need to answer the coordination question obviously some coordination is going to be required if we don't want eventual consistently magic you know we need to eventually decide to coordinate that's what needs to be eventual we just need to decide where that happens processes require it for perception does not perception should not require coordination and almost all the time it does and that's wrong because that's not how it works right that's not how perception wise that's not how we saw the monkey no but anybody asked the monkey send me your your image get get your views of the orange anybody do that were you all like sending messages to the monkey yeah please give me whatever did anyone register for updates from the monkey no that is not how it works it's not how it works right light bounces off the monkey the monkey has no say about it it comes towards you you can open your eyes or not but that's all just going to happen right how do we model that all right and then we get back to the value problem right with if you if you can't at some point get out of the state and say I can get something that's immutable that I can think about you can't really have a conversation beyond that it's just like well everything is fuzzy and changing and there's no way we can ever make a sound decision there's no way one person can convince somebody else of something else there's no way we could add up our books or calculate what our interest rate should be or anything else we're just stop so we got to reach immutability at some point all right so I'm going to talk about value when I say value this is what I mean something that's immutable the definition of dictionary starts with small things where we all know this 42 is a value right we've already scaled that up right and how in languages people are using and how many your languages is a string of value now most right how many people using languages with mutable strings right sorry most of you or not and then you can get bigger right dates eventually Java will learn that dates are immutable but joda-time has learned this right and it can get bigger snow right collections can be immutable entire collections right how big can we dream here can we make an entire database immutable or effectively immutable or appear immutable so that we could for instance see it so that we could for instance remember it let's try it when I say identity I'm talking about some blank thing oh that I'm talking about some logical logical notion it's not necessarily the name some logical notion that we apply to a sequence of values over time right it's not to say that this entity that we're thinking about doesn't change right but there was definitely was a time when the monkey was all together and now the monkey is different and we can remember those different times and we can process them it's and we want to make programs that can do the same kind of thing so we're going to say the state is a value of an identity of a point in time and a lot of what Eric was talking about you know you could say we're like identities right if you can go to that lazy and ask for the thing out of it right the lazy might be they do anything the thing is the value you get out or if you can ask for enumerable for the next value and you talk about those kind of values then that that enumerable thing is acting more like an identity and time is just this relative thing that we don't know a lot about so we have this model we could use we could say an identity is just a succession of states each state is a value right you know the real world case of you know does the monkey stay around there no but but our memories of the monkey can stay around there right that's what we're talking about here we need to make programs that can remember things and we make need to make programs that can see things so we're going to use functions to go from one state to another we all associate identity with that whole thing and we're able to perceive the important thing is the perception is uncoordinated you can just see it's like light you don't need to a stop wait Eric don't do anything so I can look at the monkey now okay so we know how to implement values in programs we use persistent data structures I don't have a lot of time to talk about it but basically their trees their structural sharing we're not talking about persistence on disk we're talking about immutability and efficient change right and so the technique that's used is you represent your data structures as trees when you want a new version you don't copy the entire thing right you copy as little as possible and you share structure with the old version and that allows this one to be immutable and the next one's efficiently created this is how we do it in memory one of the things that this talk will be about is what do we do on disk what do we do when we want to make this dorable so I think we can go back and that original slide was from a talk where I was talking about something else objects it's amazing what happens the slides when you give them to somebody else and they put them together in it and I think this mutability stuff is awesome I I drew that original diagram and talked about objects right but it ends up that the database this was a big nasty object this is big object we all have to contend with in our system it's this place we go to right we say please please change somehow and like something happens inside and we go up to him you say oh please please tell me the answer this question it's like it's like this Oracle says there it's like that it's a big object and what's what's critical here is what can we remember nothing if we apply our function to something what we apply it to who knows consistent read eventual reads serializable who knows all those things really well yeah great not working so what's missing from this diagram from what we had before that stuff oh this is great to talks from because that was an image I can like erase the image science like play the programs would not know how to spell so we can say the same we can say the same thing about the database right except what is the identity the database connection before in a traditional database right that database connection it is all we have it's the thing it's the totality of what we know it's like an object right it only has methods query answer you know change of something you know that's it this connection is the only thing we can manipulate we can only write functions of the connection functions of the connection that's what we do core change functions of the connection of so we do queries we don't want to do that we want to do this right the database connection it's still there it's the identity right yeah tomorrow if you go ask your database for its value it's going to be different if your company is still in operation right and you had power overnight you should expect a different answer that's ok right stuff does happen in the world process happens the world does move forward right but do we really want to be writing functions of connections and queries against connections no we want this good stuff inside database values because if you use this as the identity this way right what is your function a function of whatever serializability mode you're in right where is the query function of we're having the state two databases in right so it's it's you know it's a it's a it's a it's a function of string to what results at or something like it what can you do with the result set not a lot right you go walk through it and print it it's not a really rich thing if you could get the value of the database now what could you do with it you query it all friggin day long what else can you do with it what else can you do with values what could do it for T to apply function to it what else not have an ORM what can you do how about communication what can you do with it okay can you email it to your friend when he gets this is gonna be 57 no maybe right so it would be nice also it'd be a lot easier to reason about applying functions to the value of a database than the value of a database connection right we'd like to we'd like to make transformation functions against that so how can we start talking about the database as a value right the first thing we have to contend with is this notion of state right because everyone knows databases change right change so they do and so the question is do the change in place or do they change by growing for most traditional databases what do they do they change in place they pretend to be a monkey that's awesome right they forget everything they ever knew every time you tell them something new it's great they're great decision-making tools how many people keep your source code in a directory and every time you change the file you just overwrite the one in the directory yeah we're doing that right now we're programmers and we know what we need what do we need every freaking version we ever made right and what's on every version when Rose anybody care that git repository keeps growing anyone's like Noah's starting over writing like this my my disc is filling up dude no why did we ever do that it's really not an information system a database that updates in place is not an information system I'm sorry a real information system should accrete facts right it should emulate because that's what happens in the world yes stuff happens but an information system needs to keep gathering the stuff that happens not you know pretending to be the world it's supposed to record information about what happened in the world the past does not change if you're an information system you have to record what happened in the past you don't go back and fix it you don't change who the president was by you know typing an Obama over Bush right that's not what happens it's not what happens in the real world that's not what should happen here so if you have process right change and you're trying to build an information system you should acquire new space to record new information and move away from using places so how are we going to do this and this talk is just like about a slice of day Tomic and it's about the idea of making an accretion you know accumulating database but also a little bit about how that model I showed before that we use in memory right with a new root every time there was a change right when we do persistent data structures in memory every time we change right we end up with a new root and usually we forget about the old root because often there's a single thread of control that cares about the transformed values and the old values you know get garbage collected but sometimes we could remember them but the only way we could remember them is by having been there right in other words if you're if you're using persistent data structures in memory and you never saw this value and now you're looking at the later value this old one is unrecoverable right you can't do it but if you're making an information system the past should be recoverable so we don't want to store a root per query both because it doesn't let us find that we'd have to have a way to find the roots which means another meta mapping to roots and the other thing it's way too expensive to do so instead what we're going to do is we're gonna we're gonna have this value that just gets bigger like the rings of a tree is that still a value it's a little bit tricky right because now it's a bigger tree but one of the things about a value is if I saw X and then go back looking at X is it any different if you could say of the inner rings of a tree I I've made a decision based around the inner rings of a tree you know we should where this is what's happening to the weather and then you send a view of the inner rings of a tree to your fellow tree scientists and and they want to look at the same thing as long as the inner rings of the tree have not changed like maybe five years have passed right as long as like the photo you took is still good like people weren't like erasing it and drawing a bigger tree over it right then they could make an independent decision later so I'm going to contend that and ever accumulating value is still a value by the definition I gave before so let's talk about facts again it's this word stuff these words or words the dictionary has entry for fact it says this an event or thing known to have happened or it existed right and it's actually from a past participle that's lat that back one you know they knew what they were doing and it means something done it is split says it's about the past if that's what fact means it's something that happened it's about the past and and it has to include time right Sally likes pizza it's an insufficient fact right she used to be lactose intolerant right then she got cured and now she likes pizza right it's not an eternal thing that's how I like pizza there wasn't always a Sally she might stop liking pizza she's we're not gonna pretend to be the monkey here right we're not gonna go to the Sally place and change this right at some point Sally started liking pizza at some point later sine might stop liking pizza these are facts we can record all these things we can make decisions about happening in the universe so we wanted we want to record facts so what's our straw sure what's a fact is a document of fact hold document it's that fact is this something that happened what happens we want to change it did you change the whole document no right it's too big how about a record in the database matter of fact when did the record happen and if you update one of the fields of the record when did that happen what's the time of the record later how people ever tried to build a system that kept a last edited date on records everybody though right how many people thought that was a good time yeah no so if we want to if we want to make a system that records facts we need to boil down what a fact is and and in in day Tomic and then the name comes from what the word we use for this which is a datum it's not it's not a new idea right entity attribute value all right Sally likes pizza and then what when when did Sally like pizza and we use the transaction to encode that as opposed to the time of day and there's a bunch of other reasons because transactions are first-class and we can have facts about transactions like who said that sound like to pizza you know or where did I find that data the other thing we need to do if we want to make a system that can manage information about the world so we need a way to represent process right the fact that you know the monkey had its head and arms torn apart right what what is the representation of that fact happening right we can say you know the monkey had its head on you know before the talk and then you know then what there'll be a new fact which is and a certain point in time the monkey did not have his head on right but but our information system is going to need to have that information sent in all right so what's the representation of process of that change how do we how do we form that and we're going to say that a primitive representation is just the assertion or attraction of facts monkey has the head monkeys almost torn off monkeys leg was torn off we're just going to assert these things if Sally likes pizza and then becomes lactose-intolerant we may have to retract that that's okay and this is minimal but it's insufficient right because we know there are transformations that happen in the world or transformations we'd like to represent in our business that we can't express by just saying you know your bank account balance is now 1010 we really want to say is your bank account balance is a function of what it was plus 10 right but we can we can apply those transformations and then turn them into these and this is what we end up with at the bottom I'll show you how that works in a second so fundamentally what we want to do is take the database apart right try to get the value part out there's other aspects of the system I'm not going to talk about much tonight but if you look at a traditional database is doing a bunch of things right doing transaction management it's doing i/o it's usually in charge of local storage right it does indexing and it has the query supports what we call the traditional monolithic database and there's a way to split this apart right because I think this split is really important there's a process side to a database which is recording those transactions accepting and recording those transactions right and possibly organizing that information for other people and that's you know the output side the database is going to spit the stuff onto storage and then there's a perception or reaction side and it's not that the reactive stuff is bad in fact the atomic has a push based plug-in call me when new stuff happens but there also should be just straight perception I just want to look at the database I want to ask anybody's permission I want coordinate with anybody I just want to look it should be like light I should just be able to open my eyes and see stuff and that part incorporates both query and by implication the consumption of indexes so we like to take this stuff apart and there's a bunch of reasons to do that one is scalability the other is to address a bunch of the complexity things before and the point of this talk is to give that value of a database to a program so in a contend that if if if what you're using to store data another completely empty word that we use doesn't organize it it's not really a database right it's like a bag and you put stuff in a bag like you can't query the bag you just have to do this fish through it right that's not a database right key value store it's a bag it's not really a database there's no leverage there indexing is what gives you leverage organization gives you leverage so what we're going to say is a way to organize these facts is just to sort them and simple sorting it's a powerful tool that we use you know that's why we have computer science right they invented sorting but but it ends up that this is a hard thing to do right in particular if you want to have an immutable set of values for your database right you cannot afford to sort live and in fact a lot of databases can't afford to sort live even when they're updated in place right live sorting live indexing is a huge source of overhead for databases and there's great research from Yale and other places showing we're wasting a lot of time around that stuff but we have good examples one of which is BigTable of how to do this better and the way to do it better is to do it in batches right you accumulate change in one accessible area and periodically you put that stuff into another area and anybody who wants to see everything merges these two things in the case of BigTable they do take an immutable approach to storage which means they'll accumulate stuff in memory now they're also logging it so this isn't about durability just because the accumulated memory doesn't mean if you pull the plug they forget everything they're logging but the information model is a combination of what they've accumulated in memory and what they put on disk and so what they do is they build up a block in memory they sort it and then every now and then they take that block and they just spit it out on disk and after they've done that they never change it and then they'll get another one they put another one on this now the point they put the second one on disk if you want to see everything what do you need to two files on disk and whatever's in memory every now and then they got enough things on disk and they say let's merge all those into a bigger file once they've done that if you want to see everything what do you need to merge that bigger file and what's a memory and that's it so this principle is a sound one it's very efficient all right the only thing that the atomic does differently is because we're trying to give all of history and do that efficiently and there's a bunch of other storage related things we're going to use persistent trees on disk instead of ever bigger flat piles so this looks like this there's some process in charge of transactions in the case of de Tomic we because we split everything apart we can make independent decisions about the scalability consistency and availability of reads and writes and we actually make traditional decisions about writes write single server it only accepts transactions that's all it does and it keeps a live index which is the novelty it logs to storage as it goes just for durability purposes but every now and then in the background a process called index merging will take the stuff from the live index and merge it into a persistent tree on storage and then you can forget the live index and start replenishing it so you just have a small window that you accumulate just like the BigTable model but with the trees that you saw before except instead of having new roots we're going to have one big root every time we do an indexing job as opposed to every transaction and the tree will have everything that ever happened not just the latest and then what's perception like well perception is just this merging job right if you have access to this live index and access to storage you can see everything any coordinator in here trans actors not here is there a server no just like there wasn't one governing are looking at the monkey nobody's in charge of that we can all do it if we have the rights to storage we can all do it so this is the high-level view of the architecture of des Tomic we've taken all those bubbles from the monolithic server and we spread them around in particular we move storage out and we said that is just commodity key value store old-style database I don't care what you want to use anything that can let me put a blob in with a name and get it back it's not database it's just storage right it's like you know squirrels hiding egg corns you know I can find my egg core and I'm good that kind of a database there's a trans actor that just shows transaction processing if you want to keep it highly available you do what you used to do for databases just run a second one warm right but this is the key thing the application process has got query in it my app servers have query in them they have the query engine in them because they can access storage themselves the other thing the trans actor will do is reflect change so that everybody who wants to play we call them peers has their own view of the live index and they all do the same thing right once the new index has been merged they're told and they say okay do over start over just drop that and they're always merging these two things but now we have potentially horizontally scalable storage right this could be DynamoDB right any of those big massively redundant highly available things and query is now horizontally scalable because we can just add more app servers and get more quarry capability and what query is being presented with is not objects now so what's the difference in that old database when I talk to the Oracle right what could I get that best an answer to one question what do I get here the source of all answers to all questions I get the power that's what I want so the memory index is pretty straightforward it's a persistent sorted sets like a like a persistent b-tree and you have pluggable comparators and we always maintain a couple of sorts entity oriented which allows you to treat the database like it was objects except they do have history and everything else they're not just you know pretend to be monkey things and you had attribute oriented indexes which allow you to treat the database like a column store it's very important that you can slice the thing two ways if you only have objects you get you're just crappy Aquarius you can't answer basic questions because it's like a million objects dude I can't ask a million objects for their name I don't have time I want something has all the names and that's what the other index does and there's also kinds of other indexes you can ask for on the storage side there's a there's a log and again that's just so that you get durability as you go but the fundamental thing is that there's going to be these covering indexes and they're represented on disk or in storage as trees so what the storage engine will do what the database will do is treat the storage engine like a key value store just like a traditional database treats the filesystem by a traditional database stores what in the file system there's a little file with like your name in it no right what does it have blocks of its indexes index segments and notes that's how it uses the file system right and that's the way the atomic uses storage it doesn't matter if using a sequel database we don't store Sally likes pizza in a row in a sequel database right we take a big chunk of our index and store it in the database under a key that's why we can use key value stores or or sequel databases so the requirements of the storage system are very minimal you must have key value access occasionally we need consistent read and very CLE we need conditional put dynamodb satisfies this and sequel databases satisfy it and hopefully we can get things like react to satisfy it as well in memory the database value looks like this merge right there's this live memory index I said is a persistent tree then there's the one that's backed on disk what's really cool about this is this is that lazy value thing right is the entire database in your on your machine no this is not full replication every peer doesn't have every database every peer has access to storage right and if and only if you go and ask about some particular thing and you don't yet have it and it's not your cash that's not a memcache then you go to storage now how can we do all this cashing are we afraid of monkeys exploding in place No why not it's all immutable it's the facts you can't change the past it's all the past so we can remember it on disk we can stick memcache in between there you can have a local cache in between here and then it ends up that that stuff is zipped and it's zipped here and you can remember it unzipped oh I don't really care you can remember it all you want you can stick proxies in between you can put your database behind a CDN why can't you put your database behind acedia you should be able to okay and then there we have the sting look those things from before the identity there's this one little cat thing one one cast that's the as much mutability as you need to implement a database one casts sell one and it's got a pointer to an immutable thing a pointer to another immutable thing right and yeah when this new process this has grown and this case gets a new one of these but if you were thinking about the old one it did not have its head ripped off while you're in the middle of calculating did not will not happen to you and if you told your friend the scientist could you please look at this weather data how do you ever had a system that exhibited a bug and after you added more data to the database that went away yeah I have more than once for me how many people like to have gone and said what what listen what was happening at two o'clock to your database and ask the same question again and then be like wow yeah that is the wrong answer and then like fix your code and run it again against the same version of database and say fixed woohoo as opposed to doing what trawling through logs and hoping there's some evidence of what might have happened right we don't want to do that anymore we really do want this the index is not terribly interesting but you know it's a tree on disk right there's three levels there's a root which points to a directories of segments and the final segments are just sorted data datums they're blobs you can think of it like be tree nodes except be tree nodes save space so you can like you take them in place but that means you can't cache them right it's it there's trade-offs right it's not like oh cool there's trade-offs real ones all right so I said before you can't express transformation so we support something else which is transaction function it's a cool function look at that argument anybody that's not the connection to the database that's the database value right you get past a dispatch when any arguments and you return new facts or new database functions and the transformation happens iteratively until everything's been replaced and so eventually the process bottoms out into a bunch of assertions and retractions that looks like this so you have assertions and retractions and maybe a database function and that will be applied that yields two more database functions and eventually it all turns into a source and retractions there's no more functional transformations but each time they get passed the database and you can do whatever you want you can issue queries right how many people wish sequel databases were more functional and composable and you could really write functions that return result sets that you could pour into other queries yeah that's what it should be that you're not wrong you you're right for wanting to have that the trans actor accepts transactions right expands it just the way I said right it actually applies it to the old value of the database that gets a new value of the database it tells people who cared this was the new process it does not send a whole database to everybody it sends just to change monkey had head ripped off 640 that's it not a new monkey document right the fact right every now and then so broadcast set back to peers and there's indexing the background unsurprisingly indexing creates gar it's just like what happens in memory I have people working a garbage collected language time people want to go back yeah right okay all right then we had the peers write the peer it's a great word it mean it's like I trust you as much as I trust anybody else we're like equals in this that's how it should be it's not like oh please give me the right answer right it shouldn't be like that we equal power so you do any peer has access to the storage service peers have their own built-in query engine right and the query engine is really kind of an orthogonal thing I mean it happens to be the case that diatomic ships with a data log engine but you could have any number of query engines it's orthogonal right they have the access to the live index and they know how to merge cashing out the wazoo right local caching caching in a local memcache then our shared memcache cluster behind everybody else and then finally whatever dynamo does or whatever a sequel server does or you could cluster Postgres I don't really care do as much as you want it's very straightforward right so what happens after we do this I mean why do this it sounds cool all right because it's simpler because programming against connections sucks it just sucks it's not great it sucks it really does suck right so we want this epical state model I showed you before we want to limit the amount of coordination we have to transaction processing we want direct perception right we want to be able to reissue the same query we want to be able to say I found the guy we should be reporting on it's Fred and then somebody else say I need to show Fred on the web page give me his email address and whatever and not be like I don't know what fred is anymore fred quit you know or you know I can't make a total of all the sales for the month because I can't issue more than one query to do that per department and get a total that adds up how many people want to work with an accountant that uses a ledger there he erases stuff and writes new stuff in yeah so we had a stable basis you can communicate a point in time of the database with a long with a transaction ID you can say I think I found something weird in the in the weather data at this point and you can send it to somebody else they could look at it three months later and see what you saw just like you took a picture just like you remembered the monkey before it had its head ripped off same thing so you had this basis and a transaction is well defined it's a function of a database right what else happens we have that I saw this already you have the communicable basis right we can move anything around that we want right we now have this ability to disentangle where things are located you can start with memory you can use a local disk you can put it in sequel server you can put the same thing in DynamoDB don't care right you run your peers wherever you want run half the app in the cloud and half the app locally does not matter then the big payoffs right the big-time payoffs the database size as a value not only means that you can issue queries yourself but it means that you can talk about those inner rings you can talk about the past like just like we can talk about the monkey before it was ripped off and you can tell this the story of it because you can go back that's what memory is that's what you want your programs to have memory now you don't want to be simulations of everything there are places for simulations and this parts of your programs that will behave like machines and mutability is great for that right but for information processing we really would like memory right which means that we can go and say I mean this is this is a function of a database that yields a database value value right yes you will talk to the connection you'll say give me the database then you can say to that database of value show me what you look like two months ago or at two o'clock when we were having that problem give me the database eyes of two o'clock give me the database since this point in time give me the database as if I did made this change anybody ever want to see what a speculative changed it and tried to do it with roll backs and stuff like that anybody piss off your sysadmin when you're doing that yeah that sucks that's not right right so now you have database value in hand if you want to see it as if you did something else what do you do you just do it it's a value you it's all you you don't need to talk to the Trent sector you're not actually saying I'm doing this you're saying what if I rip the head off the monkey would it weigh less yeah look at that a monkey without a head eyes add up the components and it weighs less that's so cool because I can fit more headless monkeys in a truck let me go do that and then go is she then really do it for real so you can do as if things the other thing is you can do quarries that compare times and cross time right this is what businesses need how many people are doing big data unlocks yeah or think they might need to and soon yeah why because the your business is telling you that database you gave me it sucks it forgets everything after I tell it anything new we need to go look in the logs because the logs have everything that ever happened and they doesn't forget anything let's start writing Big Data apps Big Data let's go do it so you can write a query the crosses time how often has this supplier change their prices if I only knew their latest price I could not answer that question how many people would like to think how good would your decision-making process be if you forgot every past value that you ever knew it'd be terrible right most business decisions actually involve the past right involve Delta's between things or rates of change so you need to be able to query across time as of different points in time and across time so you can do all this stuff right you can issue a query it takes forever anybody issue a long query took a long time again was this aside been happy with you know if you're doing this and your shoe alone quarry who's it impact Yahoo that's beautiful well I'm run long running analytics that's fine right want to figure out if a query is really expensive go have at it running on a spare box and so so the net the net impact here is this is this is just completely different it's super super simple and it really reduces the complexity in your applications it gives you a lot more power right you're not handed you know the function of a database isn't a query as in a result set it's a real database you can call yourself you can scale things better we saw scalable reads and scalable queries and it's a real information model that's it yes I will repeat the question okay so the question is I talked about a value as an accretion right and I talked about identity as sort of this putative entity right so the identity is the database connection right I want to go talk to you know our company accounting system about you know our balances so the company accounting system is going to be some system and it ends up that a system in DES Tomic it's not like one server but it's the combination of the trans actor and whatever the storage is plus the name of the database that whole thing is the identity given that identity you can go and ask for the value of the database as soon as you've done that anything that happens out there it doesn't impact you I can't impact you so now the question is what does it mean to have an accreting value right so the first thing is once I have this value in hand it's not going to accrete right this thing inside the connection is is accumulating new information but this is not growing the other thing is if I communicated this to somebody if I said check out the answer to this query against database I would be communicating to them two things when I communicate a value one is where to come from but the other is that T it's being able to talk about that T that allows you to recover later so I say 2 o'clock we had this problem when I communicate that database to you I'm going to say the identity of the database and just a T with those two things you can recover a value that's the same value that I had and in spite of the fact that the thing inside that identity is now bigger if we look at the inner rings of the tree they're the same there's hardly any schema to change right so the quite yes so can we handle schema changes what schema where's a record definition here document definition nothing it's atomic there's almost nothing to change yes you could rename you can rename attributes because the only schema definition there is is attributes of course there's always an implicit schemer right but what matters is have you married an explicit schema right relational databases definitely have a problem you need to know this is kept in that table and that's kept in that table and if you ever need to connect those two things you better know the name of that up table right furthermore table names are not first class they can't be components of queries or anything else so yeah so there's a trans actor which coordinates everything that's strictly serialized you can imagine it as a single observer of the universe and if you if you're talking about coordinating for instance if we both want to increment somebody's balance you would use those transaction functions for that right rather than each of you asserting the new balance is 107 you would say I think the new balance should be 10 higher than what it was and I think it should be you know 5 lower those two things commute right and therefore it's fine so you use database functions that are commutative and you can have things that merge transparently without any eventual consistency but of course you know this is a definite trade off I'm not saying there are times you know there are times when you have to choose eventual consistency because of your availability or right scalability requirements so this database is definitely sitting at a point in the space which is a big space that says you are not going to exceed its right capacity and you value the transactionality but you want the read scalability so thing is a different place from a monolithic database and the free for all that is key value stores yeah implications for issuing a query so again I said the query language is sort of orthogonal I'm not talking about it here but the query language we include is called data log and it's it's really easy to use it looks like pattern matching if you've ever seen pattern matching in any language and the joins are implicit and we also offer you more control over what happens I'm not a big believer in sort of the magic box you know query planner will solve every problem because it's too too hard so you do have more ability to drive it but you also can get direct access to the indexes and build your own query engine I'm happy to see it and for instance CoreLogic from closure can use this as a back-end
Info
Channel: jasonofthel33t
Views: 7,850
Rating: 4.9220781 out of 5
Keywords: Clojure (Programming Language), database, datomic, GOTO, rich hickey
Id: V6DKjEbdYos
Channel Id: undefined
Length: 56min 22sec (3382 seconds)
Published: Tue Apr 09 2013
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.