Camlistore at LinuxFest Northwest 2016

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
okay so this one's called camlistore it's about project I'm working on about six years I work on it with Metis lingerie here he is from brands we don't hack on it together in person too often but we got hacking the last few days so all talked in the beginning about what it is he'll give you some updates at the end and then following this we have a birds of a feather at three from three to four for installing so if anyone wants help getting your own instance set up and running we can help you afterwards so the on the website we say a Kim Lee store is a way to store sink share model and backup content so I'll kind of step through some of those and what it is the first time tell a little story about this crappy dining room table or living room table I'm moving into a new place and I had no furniture really and we wanted to have Thanksgiving at my house and we didn't have a table here so we went down and made this thing really quickly I went to one story at the lumber I did another store to get a saw brought them to my house made a table always happy I don't care at this point if the two stores go out of business I still have my table likewise this is kind of how things used to work with computers you would buy a computer or you buy a disk and you buy some software or you some open source software but the end result is you had the you use some tools and you own the thing that you've produced with the tools nowadays it's not quite like that nowadays everyone just uses the cloud and it's great these are fun tools I use all of them the problem is they have all the disks and you don't have any of the disks so you're always at risk of something failing and there's this list of sites I've shut down and people have lost all their data so my main motivation is I want to have all my stuff and you know 50 80 years or whatever if I live forever um so I want to find I mean I don't really want to sysadmin a machine necessarily but I want to make sure that I have access to all my stuff in the future so to reiterate some things here this is personal storage system so this is like think of it like you have your maybe Gmail or something account with all your email everything goes in there you don't have to care about deleting you just search and find the stuff you want and there's no quotas or whatever so you just keep everything and you keep it for life and you share selectively out of personal store of everything later so talk about storing first Oh talk about the acronym so it was supposed to be a temporary name unfortunately it stuck it stands for content addressable multi level index storage I needed a directory name when I packing on the six years ago and I haven't found a better name so its content addressable at the base layer everything is just blobs so the base layer is very simple it stores content addressable blobs you can enumerate them here you can see that I put in hello and world and they have six bytes each and you could get them out doesn't know anything about metadata there's you know there's no permissions I mean everything's private there's exactly one Akal you can get your own stuff that's it anything that can get put blobs and enumerate them it can plug into all the system so we have implementations for well three implementations for local disk in various formats one is blob profile one is you know more efficient packed representations any like cloud storage provider it works on then we have wrapping layers that can shard and route your blobs around based on different conditions and you can encrypt on top of another one you could replicate to remote servers so in terms of syncing all this stuff can sync however you want to configure it there are some default configurations that people use but if you want to wire it up yourself that config file lets you but um I also have a co-worker who has a place off in the woods little lights weekend cabin and he has a really crappy internet connection there so it's not enough to sink he wants to do off-site backup between his main house and his like all little cabin in the woods but he doesn't have enough network connectivity to actually like sync over the DSL line so whenever he flies to his weekend cabin he just brings a disc with him but can we store has a mode that says talks to the other one says what are all the blobs I don't have think of it like I you know get poll or something I mean kamle stores basically your massive multi terabyte get repo for life or petabytes for life and so can we start a mode that says give me all the blobs put him on a disk I fly there and then dump the disk to the other one and so he keeps them in sync this way it also has importers from the outside world so you can see your tweets or your Foursquare or whatever people are writing more importers um this is a screenshot of cui you can see like there's a mix of my photos I guess the screenshot doesn't have photos I'll show some more but it has check-ins that have done the last few days at the hotel some tweets tweets with photos and that's all being imported into the gambling store um so how do I put all that in we have to have a way to model all the condon so a big part of kamilly store is kind of the data model um you saw on that first slide kamilly store only stores these small little immutable blobs so you can't really build many things that's not really a fun way to interact with data is just a blob level so we have some higher level representations of data I want to emphasize that kamle store is not file centric at all a file is just one sort of thing that you can model in kamilly store for instance with the command-line tools you can write a little hello world to a file you say camludt file if you want to use the command line tools and it gives you out the ref but that ref isn't of the content hello world that's some metadata that describes the J describes a hello Tex so you can see like the mod time and permissions you could put in symlinks and directories and all this stuff and you can actually see here one of the parts of the file is 13 bytes and if you get that there is your original hello world fight so this could be like you know 5 terabyte file and it does the whole Merkle tree thing rolling checksum so if you like move it around by a byte it doesn't really 5 terabytes it finds boundaries to split on and all that stuff so it efficiently handles tiny things and huge things yeah likewise directories here I'm making a directory called foo I put in I touch a file called a and bar I put in foo it's a type directory it has entries this is a static set of entries each one of those is a file um kind of cut off on the bottom but it's not really important so it does files you confuse mount it you can use it like Dropbox but let's talk about more interesting things um how do I represent like a tweet or something mutable because so far everything we've talked about there's these like immutable snapshots of files kamilly store hisses concept called a perma note a perma note is just signed random number basically here we create some new permits think of it as like a little stem cell of data that can become other things so every time you're in Parma note it puts in a little put it's in a blob and if we get one of these blobs like that one you could see it's kamilly type perma node some random gibberish and it's signed and it's signed by this identity and that identity is just a PGP public key so there's it uses open PGP and you can use your identity if you're already PGP user if you don't to think about it it can manage all your keys for you and so what we do is we make one just arbitrary number this perma node and that establishes the ownership of that object in the world and then you mutate you mutate it afterwards so here we make a new permit owed call it X and we can put an attribute on X and say title at some title now we're gonna be think of a better title so we can put the attribute title better title and then we can ask cam can we store to describe it and it says oh that here's a attributes title as a better title but that original title some title is not actually lost if you look at cam tool claims you can see a list of all mutations that have ever happened on that perma node and you can actually do queries in the past and you say what did this object look like and say this time and basically just ignores mutations after that point so you never lose anything notice the blob server storage interface on that first third and had no delete operations this is like append-only storage and you just mutate things and make new claims you could make a claim that something is deleted and then it looks deleted in fuse or in the web UI but then you could delete the delete claim and then it shows up again so yeah you don't lose things at this point you say cam to a list and you have a whole bunch of crap and maybe some of these are chunks of data maybe some of them are claims some of them are permanent some of them are mutations or schema blobs that is the role of the next layers so I didn't mention search but search is kind of fundamental to doing any of this other stuff so if you think of this architectural II you have like your blob storage which is all dumb then we have some indexing on top of that making sense of like what this whole mess is because you don't really query efficiently so we're gonna index all that crap then have a search system on top of the indexes and there's some like verification that signatures match so you can't spoof claims and then we have an API on top of that and then all like the fun stuff like the fuse mounting the web UI the command-line tools all run on top of the API the indexing is basically anything that can sort key value pairs so any database you like is probably already supported including if you don't want to set up a database you could use local files the built-in level DB implementation is used the index isn't important to the safety of your data so you can run like you can keep all your blobs on something you trust or two things you really trust and you have your index locally on your laptop and if you drop your laptop and you destroy your hard drive and you lose your index it doesn't really matter because you could always just reindex it later then the search we have to search systems we have a search with kind of a search syntax similar to Gmail's little search syntax you write in the query box which is like single line colons and stuff like that label colon foo and it doesn't give you all the control that you might want so if you want you can also do a structured search what happens internally is it takes the Gmail style cutesy search and turns it into the whole structured JSON search thing and when the server starts up it slurps in all your indexes into like compact representations in memory for the search query planner and stuff so here's an example of a search we make a permit ode we do a search and say show me all perma notes that have attribute tag equal to funny nothing then we make that permit out that we created we say make it tag funny and then we can find it now it shows up in the search so that shows that the blobs going in the indexers noticing it indexers updating the indexes the search has been finding it you can write it like that with a little cute syntax or you can write out a whole complicated thing you can say oh I'm going to do a logical or of this sub bat and this sub match and you know yeah you can do you could do lots of crazy things that way searching for files versus permit over says raw blobs around and so that's available be an API or command-line tool more screenshots of the UI and just in case the live demo doesn't work this is doing geo searches this is location Seattle is image before 2014 so there's gasworks Park location Moscow notice it found both some photos of Moscow but it also found check-ins and Foursquare or swarm now it is because the check-in hasn't associated there's an there's a perma note that represents the check-in but there's also a permanent that represents the venue pulled from poor squares API and that thing has photos associated with it and that thing also has GPS Wow a lot long associated with it so it's able to find that this check-in has an Associated permit note and that has a location therefore the location of that check-in was within the boundaries of Moscow so you could just say oh is check-in as one of the one of the search operator types you can say is check-in in Seattle is pano which is a cutesy little search operator that just says the width is over a certain ratio from the height in the web UI you can go in and look at the details of stuff like here's the attributes on a permit out and you can edit them if you wanted you can look at the raw blob same as the command line tool and say what if the blob and what is the indexer and search system have to say about it so let's do some live demos this morning I posted this poll on Twitter and I said Dave the kamilly store Talk should I write the talk or should I add features and adding features one so let's see if this works okay here's my uh there's my live server okay we're gonna take a picture the audience everyone I don't know I can see you all it's a good camera everyone look excited yeah okay let me go over to tweet doors there is also like a live uploading mode but that is not the new feature so I want to I want to try the Twitter's and I wrote live cam way store demo I'll tap that photo okay it's tweeting sending tweet tweet sent hey shut up so what happens there is camlistore had a live connection open to to Twitter and then I posted to twitter twitter sent something on the live user stream connection to that importer ran and this is the browser also has a WebSocket open to here so then kamilly store notices whenever you have the UI open in the browser and it looks at any search queries the UI is looking at this is all written with react and if the query you're subscribed to changes the UI updates dynamically meanwhile that also it's stored in my house but I also have a real-time sync going to s3 so in the process of that it you know just uploaded whatever another Meg or something to s3 and if I was didn't trust that my s3 copy was in sync I could do start validation and you can see that it runs in the background and it scans both my local machine and s3 and tells me that you know whether things are happy or not so I have millions of blobs um I can show ya really relevant I know uh let me show a little demo here okay so so Kim to list this is my local laptop and like a dev instance I can they also you can mount I'll do can now home Kim Lee and I can see the into camp for as he can to can we and you can see there's some directories like sha-1 and whatever so let's go into camelus go through I'll say can put file doc which is a doctor directory so if I say can get this you can see it's a directory and if I run find on it you can see like this is all the documentation for a camel a store but when we put that there we got this identifier which is a static snapshot of the content now in the fuse filesystem if I CD into that directory I can see it in fuse or I can you know open it like this and read the docs and this could be on a remote machine this could be all be cached but I could also do stuff like I have this directory here called recent and if I go into recent I see these but this directory is not actually it's not except out this is doing a query to say whatever my recently created permanence on the server and then I could see like you know here's an army of darkness' deep dream but I just dragged over but we could we could drag something else over there so let's um I don't know if or a lot for lack of something else we'll take a screenshot of of this and now I have it on my desktop I'll drag it into cameras door Oh piggy it's sad piggy sad piggy means that the WebSocket connection isn't active because I restarted the server okay so there's the thingy let's do a better one you could actually see the thing work okay we'll go meda here so now it should apply okay but if we go back to hope you must go deeper we go in recent now you can see that those showed up in my recent folder and it's not really folder you can't delete from here you can't add to it but when you add perma notes in the web UI they show up in fuse and vice versa you do searches be a few so yeah yeah any other stuff can be routed however you want so you can have get up all your blobs live like encrypted on s3 and unencrypted at home but cached on your laptop and fault it in on-demand and all sorts of stuff or you keep all the metadata locally but all the big blobs remotely your thumbnails locally like um some of it's more of a pain to configure than it should be right now but everything is possible with the right consideration there's um I see the default server there's there's that tool to dump the easy config to the complex config tool dumping things yeah ok so I actually have a file here called Hamleys store or what is it like library I forgot the OS everyone uh yeah chemistro uses whatever the correct no it doesn't i thought it did okay so this is like the simple config that most people use it's like you know i if you start can we store by hand he said go get kamilly store or you download it from the website it'll use a this is leveldb right do we just use the old file name kbdb yes okay well this is the old one I have you can use leveldb as the default nowadays I'm using an old sorted index then all my blobs are just going in this directory oh we do use library for blobs so you can see I have a file per blob it's not necessarily efficient but it's easy for debugging and getting started some file systems efficient if you use like unit ZFS or butter FS it's more efficient or x4 with tail packing or whatever but so there's a simple config if you want to see what it actually maps to there's this cam tool dump config and it shows you like the low level config and you could wire things up however you want and each of these is kind of documented a little the the source code has better docks at the top of each one of these plugins about what key values it supports and the error messages are really nice if you screw it up it tells you what's you're missing so you can you know have your thumbnail cash somewhere else so if I go to like my my personal server here you can see like it does infinite scrolling of stuff and it bolts it in on-demand it'll yeah kind of infinite this is loading more stuff no so we actually do have downloads camlistore not org slash download ugly HTTP we're all going there so there are binary downloads somewhere that's done with go I clicked it on one do we know I never downloads no oh yeah so there's a zip and you could download Linux OS X or Windows binaries see it's all go so we could just cross compile to anything you can run on FreeBSD your plan dying or something weird if you're lazy though and you don't want to run on your own we also have this thing that we've just created this is Mateusz project he made this launcher so you could create an instance on GCE and we fire up a VM and a container running your indexer runs like MySQL in the container and puts all your blobs on Google Cloud storage and you pay Google not us so this just doesn't OAuth flow and redirects you off to Google but it instantiate sit on a core OS instance we don't have to like really sysadmin things and auto reboots with security updates and stuff so um some people later if you go to the birds of a feather thing can install it that way or some people can just install the hand it runs on like raspberry pi's or scale we're gonna talk about anything well I try to remember what I skipped yeah you authenticate like how is the question is how our remote data stores secure you talk about for blob storage well if you're the admin on the machine and somebody else is using your machine yes you could look at it if it's not encrypted they could choose to encrypt it but if you're the admin of the machine you can screw with them a lot of ways including like you know owning their crypto code so well maybe they shouldn't trust you oh that no it's optional the PGP key is created automatically if you want it's the it's the unit of identity for so the whole the grand vision of this is it can be multi-user too and you have multiple users mutating perma notes and so the signature that signs the mutation is who did it and the indexer respects the index are verify signature so other people can't scoop your crap so if you download like let's say a friend downloads an image or shares a JPEG with you and do you download it and they put some crap in the exif metadata that the chunker aligns just right so it looks like a claim mutation that you're making all your stuff public that's another thing to talk about the sharing they if they don't have your private key they can't spoof claims as you and so the indexer can never be confused you don't sign you don't have to sign files the files in that one thing when i said say hello world the text can put file world and I just get this back there's there's no signature on this this is just like if two users have exactly the same file with the same content and same modification time and same permission bit they'll get exactly the same thing it's just like gets and get there's like the blob and tree objects and you build those up into the big old Merkle tree and you sign the root you sign like you signed the commit or the tag or whatever so it's the same sort of thing you would sign you'd make a perm and you would set its kamle content to something to associate the bat node or something else like let's go into the web UI here we could see if this works I could say can put perma node nothing so you see nothing new in the UI because um it yeah perma nodes with no attributes are not interesting so we don't render them but if I say can put a door title foo now you can see it's to make it a little bigger lips - cute - and now it's bar and changes the bar or I could say like kamle node type twitter.com wheat and think I have to do a couple other is this a start date no I'm just gonna show how it's like well I'm showing uh I don't know how we got started so anything's well say yeah all these things are signed and all these objects like tweets are just know you'd of a collection of attributes and I don't know where I was going with that you well you can sign the top-level things okay so I should like we're gonna put in camp foot file from an old text and so we just put in a world text file here and if we look at it look at the perma node you can see the kamle content is set to this if i look at the raw blobs of it here the list of claims and there was this claim that associated that permit its content to be this value and this value was just the static file so that thing is unsigned but the thing that assigned is the mutation this one the blob content so this is a set attribute claim and it's modifying this perma no that's the subject and this is the new value of of attribute or is attribute of kamle content and that's the signature that has to match this signer so you sign things kind of at the top not at the the base data at the bottom as far as encryption you can choose to encrypt all your raw blobs to either locally at rest or remotely at rest synchronization across networks where all I've got is port 80 or 443 that I can send and receive data on can camlistore get around that sort of stuff or the questions about synchronization and what ports have to be open uh it doesn't listen on anything by default so it would like connect out to something else that was available and then sync over that and that is not as awesome attic as we want right now for better or worse so you kind of have to do it when you want it to do there's no actual like automatic no yeah you wouldn't you would have to like configure it like on you know Wi-Fi connectivity change or on power do it but yeah we want to we want to make that more automatic in the future yeah so we go to chemist or husband of my production anyone chemist or importers these are the current ones so arbitrary RSS or atom or whatever Flickr for square which is swarm Picasa which is all Google photos pin board Twitter Kate here was working on a run keeper one I have a lot of data and run keeper we both have a lot of data and run keeper that we want to suck in we want to work on one for a Google Android Wi-Fi logging stuff so you get up all your tracks in here if you're recording it that way but these these are pretty easy to write the API that's provided to you don't have to think about blobs or signing your basically you don't have to think about scheduling too it'll like automatically wake you up you can choose to implement for Twitter I today I added just long pole functionality so if you optionally implements a feature to like wait for new updates you can get run when there's new stuff or you could just be scheduled automatically like I can go to Twitter here it tells you kind of like what to do like how to configure Twitter go to this and create an an app we don't have a default app for a lot of these because we just didn't want one point of failure that if our app is shutdown your thing stops working so you created an app on Twitter and you put in your client ID and secret you got as many accounts as you want and you can configure the policy about how often it runs so this one is running every 30 minutes but it's also long polling all the time so the tweets come in within milliseconds for almost all of these it'll go back in time and the API as far as it can Twitter's API is kind of busted and that it only goes back like a thousand posts or a hundred posts or something dumb but this one also supports notice here it says Brad Pitt kamilly store ID or twitter ID with that plus zip file you can upload your Twitter like data takeout zip file that has all your history and it will also recognize that and slurp in all the content which annoyingly has a different JSON format slightly different but almost the same but different as the API JSON format so yeah new importers would be a great way to contribute to the project because they're pretty easy to write and like they're pretty fun Facebook is missing it should I I don't really use Facebook as much as some people so I'm not personally motivated to it would be oh so one of the big features coming up is we want to do right now when you go to that camel a store launch or thing well.all i'm matthew to talk more about that but okay the question was where is the metadata there is no difference between data and metadata when I say cam tool list this is both my data and my metadata if I let's get one of these I don't know what this is that is metadata let's get another one that one is documentation that is part of some file in this case it happens to be the whole file but this could have been a 64 K chunk of a huge file and it is the job of the indexer to go through this list of all this stuff and make sense of it all and build the right indexes for search that is another good question uh you can both create your own metadata with like that cam tool put attribute funny or put a tag on things there's also metadata that is the index or synthesizes itself which is like less subjective like the exif said these are the dimensions of the photo that's just true like you can't lie yep and so it indexes all the GPS data from photos builds an index at that but then when I did a search query and said like location Seattle it did an API call to a geo provider to find the bounding polygon of Seattle or whatever yeah um there was a slider so earlier you might have missed it that you cannot delete anything so if you set title to foo and then you said title to bar later you can always go back and look at the world as it existed in any point in time it's just a log of mutations it's not you could go by hand and delete it yeah there's some there's some like overrides to say delete delete yes yes yes I really mean it so you would just encrypt it locally and update upload the encrypted blobs to your friends and they wouldn't be able to see them at that point so you would right so yeah so why don't I come back to that question you want talk about some stuff but I'm coming so yeah I'll go increasingly the first thing is as can restore is a pretty big project you know we try to do well but some of those packages were not really very usable out of the context of chemistry but they should so we've been working a lot I'm moving all of those and improving them generally sorry you know the word to put them on this other repo which is go for the talk so we're hoping to get more like exposure and contribution and testing focus packages the other added value in doing that is that we get Travis CI for free because it's on github we used to have builder a continuous builder for our project and we were hoping to switch to the go project one and we kind of led that they tried a little bit so for a while we do not have it yeah any testing going on for the whole project so the more we move to this github repo that better days and let's keep the rest of the github thing and also our website used to be updated occasionally and now it's like we've made it up made it all automatic so like every climate is like scans and it updates all the pages documentation and documentation itself is always refreshed also we've been we've been getting a bit of grief over the fact that we don't really do point releases as often as we would like to and so we've worked even though we continuously work on on the project under get the git repo is always kept up to date so we've worked a lot on you know improving our tools to make new docker images new binary releases sorceries or everything automatic and we are on the finishing touches to do like monthly releases though people we could we will be able to point people very soon to okay this is the latest release you can search that if you want if you don't like get or whatever and then on to the more exciting stuff is that Vlad showed you the launcher that we have that runs on so on Google cloud so your instances on GC blobs are on Google Cloud storage and we log everything okay good Google Cloud blogging and what we plan on doing is to most sort of DNS system for companies taluk net so that when it gets a GPG identity it gives you back a subdomain on camera startup net and then thanks to our friend that let's encrypt we would create automatically an HTTP search for that sub domain so that this launcher will give you directly account restore instance which is on HTTP with a domain name and with with a valid search and this is like yeah the main new feature we're gonna have for sharing I think yeah and then getting back to your question was about how to discover people's keys or something well I was thinking just in terms of like doing cross campus or syncs to different computers and what sort of expect okay I mean your network traffic is proportional to how much crap you put in it right and and so what eventually what we want to do is we don't currently do in much peer-to-peer stuff right now because there aren't many users but as we've been making it easier with this like cloud launcher thing and once we like do automatic SSL and we're probably gonna either run or use an existing DHT look at the distributed hash table to find peers and stuff in that case if you have two users that are behind that you can you can go into the camel store web UI and list your friends who are willing to host your encrypted data and just put in their key IDs or their email addresses which can then map to their key IDs and we you know they don't even to know their IP address they don't need to open a port we do the normal map busting stuff and to find them through the DHT wherever they're you know current IP addresses so that's an eventual goal because then we also want to do sharing so if you like take a bunch of photos you can already go into the web UI and go to kamilly house you can already go in and say oh there's a picture of Matthew and there's that that and notice like when I check these over here oops oh yeah you can go through and click left and right and like view photos but that's a web UI once I checked one notice it opens up this thing over here I guess I have to hit that click target exactly you can then create a set with those items and I could be like I know it created a new set with that and we want to be able to like share that right now I can go in and I can say can put campus chair let's say I want to store I put that file that Doc directory now I want to share that with somebody but maybe it's like that set of photos because they can't put share that and it gives me back this thingy which is a share claim if I were to get that this is a share claim that says if you authenticate using type have ref which is basically no authentication like you have the secret token where the token is this thing then I grant you access to this transitive false so you only have access to this one blob so that's not very useful so you can say can put shared transitive and now I have this one this is what I actually wanted to do and now it's a transitive shame claim there so this is like a token I can give to the other person and then they can mount this remotely and Fault in the blobs from my server and so they don't have to download the whole thing they could just go in there you know and their OSX finder or Nautilus or whatever and it would download the thumbnails to the JPEG from the beginning with a peg and they don't even fault in all the data until they access it but we want to make all this automatic instead of having to do campo chair and the in the UI you instead just share to them and they get a little live notification on the side of their camlistore that photos were shared with them and then we do the whole decentralized social network thing because why not and yeah screw the centralization the question is the question is about integration with like office and such tools and idea what to you care about the data model or do you care about the storage of the XLS bytes we do we do not currently like model spreadsheets or anything we'll leave that open office XML and stuff storing the bytes you can well if you if you fuse mount it it shows up and you could put it in your favorite places over here so like can mount um family Mario mounting it perfect so here it is and I could be like no I can't do that Kenny how do I go up I could be like kam Li boom it's favorites so now she'll something Microsoft Word and I saw her in the canvas or routes these are boom and now yes okay so let's see if this works so here's the web UI and I've created this directory kamle routes foo and I will write hello to Northwest text and it showed up there so I just wrote to my local file system or what looked like my local file system but just you know replicated to everything yeah so for the video the question is about metadata extraction from Microsoft files or whatever we currently only extract data from all the image formats some of the video audio formats various other openiy formats but yeah we don't we don't have metadata extractors in the indexing system for everything that's another way that you can contribute somebody added the mp3 id3 tag extraction so if you put in all your music you can sort by like you know could do a search by genre and stuff like that know well it has to be indexed to be searchable so I think I had a one slide earlier with like the architecture a diagram it was the question was he likes his drop box text editor on his phone but this is just like viewing data not editing data so what we hope is we've kind of fleshed out all this stuff and now we're kind of moving up the stack a little importers are pretty easy to write at this point the command-line tools pretty flushed out fuse is okay it needs a little polish the web UI you don't like Java Script but we're we're looking at getting that better Apps is interesting because now that we have a nice app or there's nice API we want to be able to make it really easy to run apps that use the API and live in a sandbox kind of similar to sans or whatever where you run in a container with limited access and we give you like a camel ease door handle to the API but anything you write is in the sandbox anything you read is in the sandbox if you're in the outer sandbox and your main UI you can see everything that all the apps wrote but one app can't go rogue and delete other stuff so I had an old document management system that I wrote for like scanning stuff and putting in the shredder and when that was like an old CMS that I ran on something else and we've been migrating bacter on top to camlistore is kind of a proof of concept hello world app I hope that I hope that like other people build other apps especially like mobile ones and you were going to talk about your data collection thingy oh and into cook and talk or document provider yeah just because why our order hi Matthew okay so this is the Android app which I can't really show but I can do share with kamilly store and you see it's uploading and it's stuck 50% of the way through but you can see it like the bytes showed up yeah so that's an Android app and it hasn't been updated in like years are not substantially updated so it doesn't use the new Android document picker kind of fuse like framework but if we implemented that then if it were installed on the phone if you can we start install at all it would just show up natively as a place to load or store stuff from on any Android app so if you like Android development you're welcome to do that yeah yeah fun stuff did you have another question okay yeah oh I I didn't even mention I take it for granted that when I say content-addressable I mean everything they do for free so yeah I I had bazillions of duplicates of all my photos because I had like that old backup and that old backup and that stuff on Dropbox and now cellphone Google Drive and now stuff on that machine and every one of those backups was like a subset of the other backups and when I put them all together it was like terabytes of crap but it wasn't really that much it was really just duplicates and so yeah this deed oops everything well it'll for instance keep track of like that you had two files at different paths maybe with different case of the file names but that's like couple bites it doesn't really matter it's like a drop in the bucket but the bytes of the JPEG themselves there's only one instance stored so and you can't even make it store more than one well there's no Delta involved it's just that you just cannot store the bytes fo foo twice because it has exactly the same sha-1 and its content addressable I mean it's the same way as git or anything else he does you can you can verify all your stuff and we're currently using sha-1 or waiting for somebody to complain about this but all all of all of the blob roughs it's designed to be future proof because I saw md5 come and died and I see sure I saw sha-1 come in almost I and so I'm waiting for sha-256 to die too but it's future proof and that you put it in the hash there and you can configure the policy on your server about which ones you accept so once sha-1 is broken which is effectively now you stopped accepting them but you can still verify things in the past so we're going to be switching the you know she'll 256 or 512 for upcoming releases uh yeah I think I what is maybe I get a lot of questions about like have you have you have you mixed X with Y with Z and a lot of these things are possible but it's sometimes unclear why other than just mixing it okay the the models may not match I'm not sure I don't know what their api looks like yeah make a blob server that's it yeah so that could be a sync thing could be a blob server implementation yeah yeah absolutely maybe one of you wants to step up and write one yeah um so okay so back to back to sharing I showed you well I did that like can't put share transitive thing right so right now that is just sharing like a snapshot of something and you could say here's a bunch of photos we also want to be able to share like a whole set but the one thing we want to do is be able to share a search query so you could say like that my last 30 days of photos or really anything you view and this UI here is a search query even your scroll position and stuff so if you wanted to Scrolls share what you're looking at right now you snapshot that search query along with the time and all that stuff and you give it to somebody else to say you can get anything transitively reachable from the results of this first search query as the first top so once you do that it's kind of fun you could you could say anything that I take you could share something you haven't created yet you could say we're going on a hike together you have access to all my photos on this phone from this time for an hour till the end of the day and your server could be slurping them in in real time like as as you upload them even though they haven't yet what happens if you somebody is a jerk and uploads that 8gig iso well you're the jerk because this is your own data oh well there is a garbage collector it's not so you mean you made a mistake and you want to delete it yeah yeah you would delete the perma node and then you there's like a the ACS actually delete this thing and then then you could run the garbage collector over everything which kinda doesn't really work it's not automatic right now I feel like there's a mode when you start the server up for something there's there's a to-do for it that says before the indexer starts before we start serving go over the blobs run the garbage collector and actually delete the blobs that are not referenceable by any claims including the ones you've deleted yeah this is learning I mean this one's running on my laptop and I was hacking on it offline yeah you don't you don't need a any cloudy thing so like our kind of model about the cloudy stuff is like use it but don't trust it and so we happily use all the tools and we have to use all the storage but we use the storage twice and we import the data from the tools so in any cloud thing can go away and we'll just switch to the next one yep I mean you would configure that every node what the policy is you couldn't you you don't like set a path through the network on a single blob but you could say on the first node you cuz I send all my crap to this friend and on that friend's instance you could have a configure to go beyond that yeah for instance I I run a primary server and my phone goes to it and my like other stuff bolts in from that but then it uploads to s3 and stuff like that from there but yeah all that sort of like syncing to your friends is kind of an upcoming focus along are kind of next release is all about sharing and automatic automatic TLS and automatic discovery of friends and syncing and sharing the friends of backing up to friends and stuff like that yeah we don't do releases that often and so it makes the project look good the UI stored as rest yeah we have clients and so if you want to serve like a static website or whatever out of a certain hostname slash path or host passionately slash you could say that equals this perma node and walk it as if it's you know like fuse or something like that but it's not it's not actually using fuse but walk it the same way walk down directories as a fuse produced that's the right content type then you know basically is HTTP to camlistore converter and we call that publish it's one of the kind of hello world apps this UI if I go to like server status that's a debug config yeah we actually do yeah so the UI is basically just one app this is the auto-generated low-level config we didn't actually write this config but it is a built-in app but you can also run other apps as child processes that use the same API actually we should break this out we should break the UI out into a child process in the future yeah if you wanted to be auto uploaded yeah yeah otherwise you can run a cron job that says can't put this folder yep recursively and it it caches the you know I knowed stat to metadata lookup so it tries to be nice to your file system and stuff like that but Linux is generally fast OSX sucks the filesystem speed yeah yeah I mean as what ever you could replace whatever backups you're doing now with it right just think of it as a backup mechanism it's a snapshot to a certain time um I actually backup my home directory with it and just make a new permanent for every like hour so I update an existing permit node with a claim so then I could go look at my my whole home directory as it existed at any point with a granularity of like an hour or something for max that is different t-that uses HFS and hard links or dirty can we saw in Saline oh right there was somebody made an OSX client this was a while ago that let you look at a perma node and it had it had a slider and that slider changed the parameter to fuse for at so you can you can CD into a directory if I go into kamle notice there's a magic directory called at I think it has to read me how look at that and you can specify it time and then CD into that so I can say you know go into this and now there's that foo root I had and that file did not exist at that time but if I go into instead I don't know next year that now the file up here it hasn't been deleted yet on my server and so it's possible to do the integration that's I don't think that's a good way going forward yeah well it's all the way down to some degree metadata that was explicit because you set tag equals funny or something on photos that's just a blob but if it's an index that's just there for like efficiency reasons for sorting like genre equals electro punk or something on your on your mp3 we can always get that later by Rhian dexing that's in the sorted key value thing and that one isn't important it isn't important if you lose it you can just recreate it and you could choose to have those in different places and there are very different data source so the the sorted thing you can have like locally and you're on your SSD on your laptop's out your blobs you could have remotely somewhere else it does not need plutonium you just have to install go alright we're out of time so come to the birds of a feather 15 minutes okay look at the website where it is 201 in this building so is that upstairs I guess
Info
Channel: Brad Fitzpatrick
Views: 13,250
Rating: undefined out of 5
Keywords: camlistore, storage, opensource, golang, bradfitz
Id: 8Dk2iVlc67M
Channel Id: undefined
Length: 60min 35sec (3635 seconds)
Published: Mon Apr 25 2016
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.