The Challenges of Writing a Massive and Complex Go Application

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
now Thank You Terry they said my name is Ben Darnell and I'm the co-founder and CTO of cockroach labs I'll be talking to you today about things we've learned while developing cockroach DB which is a new new database that we're we're developing that's currently in the works and it's a it's all open source and written in go so today I'm going to start by sorry giving you a little more background tell you what what ring to do with with cockroach DB and why we why we started out writing right in our database in n go and we'll go through some of the areas that we found especially interesting or challenging to develop in and and some of the some of the lessons that we've learned while while dealing with with the issues that come up in these areas so I'll talk about talk about garbage collection especially that's a very big big concern in n go when you're when you're building something as performance critical as a database talk about go routines and concurrency and synchronization also about C go and managing those kinds of dependencies and then some some gotchas about gos support for for static binaries that we've that we run into so first of all what is what is cockroach DB it's a it's a new distributed sequel database that currently in in development it's a scalable horizontally scalable across any number of any number of nodes can survive survive failures of machines or even entire data centers and it's a it's strongly consistent even while even while these failures are happening so you have full transactional support in your in your sequel and everything stays stays consistent no matter what no matter what happens it's all it's all open source under the apache license and obviously it's it's written in n go and you know we're far from the largest go project out there but we're we're fairly big just to give you a sense of how big the epic the project is why you should you know why you should listen to us about when we talk about what it's like building a large Glo application the project has been around for over two years now the first commit was back in February of 2014 and it's we've had over 10,000 commits from 93 contributors and currently we're in the neighborhood of 182 thousand lines of code when you subtract out the generated files which in our case are mostly protocol buffers so the one question that we get that we get asked a lot is is why why we're building and go that this is not an obvious choice if you're building building a database this is something that a lot of people see is the domain of a non non garbage-collected language because you need you need a lot more control over over memory management but we decided to go with to go with go and we've been really really pretty happy with that that decision so when we were when we were starting the project two years ago we really didn't want it we didn't want to stray too far out of the mainstream in terms of programming languages we were using our top contenders at the time were we're C++ and Java and then go we were also interested in go and rust which were the kind of the new up-and-coming systems languages at that time and to sort of run down the run down the the choices in a little more detail C++ was was really our default choice going in the it was a language that the founding team had had used through most of our careers we knew it we knew it pretty well we liked it we liked it better than most people really and you know so this was a very this would have been a very very comfortable choice for us we know that C++ gives you gives you very good performance you can get that you can get as close to the metal as you as you need to and you don't have to deal with with garbage collection or worry about any sort of uncertainty in in performance in that sense on the other hand when you're dealing with manual memory management and and having raw pointers and things like that it's very easy to make mistakes that can have very severe consequences you can you know mess up a mess-up pointer management and you end up writing into data that's already been freed and you just completely ruined the the state of the program and have really no no sane way to recover from that and also we the founding team we'd all had had a lot of experience working in C++ at Google Google has very good infrastructure for working with C++ distributed build servers compilation caching and and things like that and Google also has a lot a large volume of a sort of coherently written code that all fits together very well going out into the outside world we would have mean we'd be starting from from scratch needing to you know make a lot of choices about libraries and fitty finding pieces that fit together in terms like do we use boost do we not use boost do we do we use exceptions or not things like that all these things you know have implications in terms of what what libraries are available to you and so it was just a very daunting prospect to try and try and get started up with with C++ as a as a new project and so that's what let us look to look for look for alternatives we also considered consider Java this is another very widely used language even in even in a lot of competing distributed databases like HBase and Cassandra but this isn't this wasn't really a possibility that that really excited us you know one thing about about Java that would have been a pro for a lot of people was kind of a con for us was the widespread use of IDE s and the atrophy of any sort of other mode of operation with that with Java this was something that again a lot of people like this we we had tried using Java IDE s and didn't really didn't really take to them so this was would have been a drawback from our perspective and then the other other big problem with Java of course is performance especially around the garbage collector we knew from experience that Java garbage collection led to large and unpredictable pauses in your application and you were you were constantly struggling to to keep the keep your garbage collection pause times down and not not interfering with the application we had actually had some first-hand experience on a previous project where we rewrote a service that was originally in Java we moved it to go and saw really huge improvements in performance and the garbage collection went from being a major concern to being a non-issue in the NGO version of that service and that's what that's what gave us some hope that the that go would be a viable option here even though it's also a garbage collected language so good go on you know a couple years ago it was a it was what a bit of a younger language and it is now I believe this was in that time timeframe of go one point for this was so go had been had been production ready for for a couple years at this point still or it had been nominally production ready for a couple of years this point and it was still kind of making its way out into the out into mainstream usage so it was it was kind of a young and an untested language at that point but it was it was it was growing in popularity and it had had a lot of appealing aspects especially in that it was very a very simple language very easy to learn and and it had a lot of the kind of googly design aspects to it that that we found appealing based on our backgrounds but again our major concern with with go going into this project was the potential of performance related performance issues related to the to the garbage collector finally we we briefly considered rust rust is a you know and even even younger language than go it's very exciting in terms of some of the things that it can do to give you the memory safety without any without any runtime overhead no garbage collector or anything like that and it also has some as some nice features for managing concurrency and making sure that you that you do things in a safe way when you're when you're sharing memory but it's also a much much more complex language if you know go gets a lot gets compared a lot to see rust is much more in the C++ vein of including a huge number of features that can be very very complex to work with and so that the the fact that the language was what was young I think rust was was definitely not ready for for us to use it in 2014 if we were making this decision again today then maybe the outcome would have would have been different but at the time definitely rest was just not not ready for us to to adopt as our as our main main programming language so as I said before the the biggest the biggest concern that we had about about go both when we were making the making the choice originally two years ago and then as we've confirmed over the over the ensuing two years of development is garbage collection garbage collection makes it very easy to easy to write applications but it it can impose some some performance penalties that are difficult to predict and manage sometimes and in particular it causes your application to just pause from time to time and so you know if GC pauses were the reason that basically disqualified Java from consideration for us why would we even consider consider a language like go that is also garbage collected well the real the real reason is is that we think go gives you some better tools to manage the manage the allocations that that your program is doing and you can reduce the the impact of garbage collection in a way that that is not really possible to manage explicitly in Java in some cases in Java the JVM can and the JIT can do enough analysis of your program to make some of these same optimizations that we're doing by hand in in go but we we found that the JVM is not is not smart enough to make these kinds of optimizations reliably enough that you can just write in in Java and not and not have to make these kinds of these kinds of manual tweaks so in particularly the kinds of manual tweaks that I'm talking about are the ability to to make better use of the of the stack and sort of control what what actually makes it out on to the garbage collected heap and to be able to combine multiple multiple allocations into into one so to give you an example of that again the goal here is to minimize the number of allocations because the the cost of the GC is roughly proportional to the number of allocations that you're doing and not the not the total amount of garbage collected memory the and so that means that if you are using multiple multiple value multiple objects at the same time then for the same duration then you can allocate them all together and not have to not have to allocate them separately have them be separately tracked by the garbage collector and so this is a this is a real example from the from the cockroach code we have a tight put buffer just struck that contains two MVCC metadata objects and one txn meta object these are some objects that we use in the course of our of our MVCC subsystem when we're writing writing values to the to the underlying storage engine these values are according to the so these values could have could have been on the heap if the if the ego compiler was was a little bit smarter it turns out that the compilers escape analysis does decide that these values need to need to go on the heap they can't be allowed to live on the stack even though they are used only within a single a single function call and so since we have to put these values on the on the heap we batch them up into one struct to to make it to make this one allocation instead of three and then the second the second optimization that we often use along with this is sink pool sink top pool is a class in a type rather in the indigo standard library which looks kind of kind of weird and magical it has a get and a put method and you can you can put objects into the sink pool and get them back out but there is no there's no guarantee that any object you put in will ever will ever get you'll ever get back and so this is a it's kind of a weird thing to get your head around but what it's for is for essentially being a specialized free list for frequently reused objects and and so this this is a way of taking taking a little bit of memory management control away from the garbage collector putting it in the hands of your of your application as with all manual memory management you can get it wrong in particular in this case this can lead to the same kinds of bugs that you could get in C with with using memory after it's been freed so it's something that you have to be careful about you don't want to use it all over the place also because it's it's kind of verbose but in places where profiling reveals that you're that you're doing a lot of allocations of the same the same kinds of objects that are not actually not actually long-lived you can use sink pool to to to speed this up so here's an example of using it with the same type that I that I just defined so you have to define a a sink top pool object here and give it a new function which is a factory for whatever object you're going to put in there and which in this case just allocates an empty put buffer and in some cases you can do you can do more in this but most of the times that we use a sink pool this is all that's in the all that's in the factory function and then to use it it doesn't that the interface is all defined in terms of the go empty interface type and so anywhere you use it you have to cast it and so we usually use a little helper method to do that and and then at the end when you're done with the object you release it back into the pool this pattern here where we do star B equals put buffer this is this is a simple way of zeroing out the object so that the next person who reuses it doesn't get our dirty data to start out every every caller always gets a brand new brand new empty object and so that's that's usually a good practice to follow if you're using sink dot pool although again there are occasionally reasons why you may want to leave a non empty value in there if those values could be useful for the put for the for the future of reuse or of this object but that's not not a case that comes up very often and I don't think we have any examples of that that actually happening so what does this look like in the application code so you get you allocate a put buffer somewhere or so you you call new put buffer which which gets a put buffer from the from the pool and then you can use it here we pass it in to pass it into this NBCC put internal method and and then once this is done then we then we release it this is a and as a side note we could have put a defer buff dot release right after the right after the call to new put buffer but we found again because this happens in performance critical code that using defer is actually measurably slower than that than just doing it cept that just doing the call on its own and so in places that are performance critical we we manually release the release the object and you'll notice that it comes before we actually do anything with the error here so right after this this line of code we have another check to say if air is not equal to nil then do whatever we're going to do with that error and so by combining combining these two techniques of allocating structures and using sink pool we have we've managed to get a large part of the critical critical path of the of the database to not do any not to not do any garbage collected allocations everything is managed through either stack variables or or pooled objects and so you can see in in the in goes built-in benchmarking tools one of the one of the stats that is reported is how many how many allocations are done per iteration and so in our in our benchmarks of of the of the low level storage code we can actually see that there are no no new allocations happening per per iteration everything gets allocated upfront and and cached another another issue related to garbage collection performance is the use of values versus pointers this is probably the issue where cockroach TVs style differs the most from the style of the go community at large we prefer to use value types by default and only use pointers when there's an explicit need for for the object to be mutated we found that people come in from other from other languages tend to use as we all did coming to coming to go this is not the you know not the first programming language for any of us it's something that depending on your background in terms of programming languages you may tend to lean on pointers over values but sometimes for different reasons people coming from C++ may prefer pointers over values because they're wary of the cost of a of a copy constructor that you've have in in C++ if you pass an STL object by value on the other hand someone coming from Java or Python or other more more dynamic language background might be my prefer to use pointers because they're coming for a language where everything is a pointer and they're just used to used to thinking in that way but in go you have you have the choice to use pointers or values and this can be that this can be a way to actually convey useful information about the about the program because in cockroach DB whenever you see something passed as a as a pointer rather than a value this immediately tells you that ok this object is it is mutable and is going to be and either this this method is going to change the object or it's going to be changed by something else and the and the value needs to be needs to be seen by this by this method and there is a little bit of a cost in passing values around there's a there's a copy you know it's not just caught it's not just copying eight bytes for a pointer but maybe you're copying you know 32 bytes or or more depending on what what what exactly is in the structure but that's that's a pretty small price to pay that's not not been not been a noticeable impact on any sort of performance that we can see we've never had a case where we change from passing a passing a value to a pointer for performance reasons that's just never something that's even come up in in our benchmarking one thing you do have to watch out for though is that if the if the structure you're passing by value contains internal pointers or even types that are implicitly pointer based like slices or maps those have those will will not be will not be duplicated the new values that you're copying will still be pointing into the same underlying map or or slice and so that is something that can can trick trip you up sometimes because you may look like you're getting your own copy of a of a value but it's actually the important parts of that value are held in a pointer that's is still going to be shared with the previous copy and so this is something that you just have to learn to look for and be and be mindful of and I think that this is this is something that we've grown a lot more a lot more careful about as the as the project has gone gone on because we've been more careful about watching our allocations and performance and also in as we've run into bugs related to mutable data being used when it when it really shouldn't have been the you know this was a fixed we fix these bugs and then had to kind of change change our style to make sure that bugs like that didn't didn't happen again so that's that's some of the stuff that we've done to to manage the cost of cost of garbage collection and to improve performance of the of the application in terms of its allocations up next I'm going to talk about go routines which are another another big headlining feature of go there that they're they're pretty pretty nice they're not that they're not quite as as game-changing as a lot of a lot of go advocates would would have you believe they're really that they're just they're just cheap threads and so they're you know you have to think about them just like you would just like you would regular threads but so one thing that that is an easy trap to following through with go routines is because they are so easy to they're easy to start and they're not very easy to monitor because the go statement doesn't you know you start them with the go statement which doesn't return anything there's no way to get a handle to a running go routine and keep track of whether it's whether it started or not and so we had a concern early on that well what if we were what if we were leaking go routines what if we had started up go routines that were starting and never finishing because they were blocked reading from you know reading from a channel that the the other end had gone away or something like that and so we wanted to want to be able to make sure that we weren't we weren't leaking any go routines and so we borrowed a file from the standard library or you found this in the net HTTP package called leak test and this is a something that you can use in your unit tests which it's basically a little hook that runs at the beginning and end of each of each test function to collect the the whist of all running goroutines and it basically compares the guru teams that are running before and after and and fails the test if there are any any new go routines running beforehand running after the tests that weren't weren't running beforehand and this is something as I said it's not an original idea but for us it came from the came from the standard library and then someone else we copied it into our own codebase someone else took it from from the cockroach TV code codebase and actually pulled it out into a separate and and reusable library so you can get this get this on its own now and and it's a if you're doing anything complicated with that with go routines I would encourage you to to take a look at take a look at that and see if it would be useful in in your own tests just to give you an example of what it what it looks like this is a this is a test that will will currently fail because it it starts up a go routine that just sleeps forever and it never finishes and so this this is a way of proving to yourself that the that the weak test works and all you have to do is at the beginning of each of your test methods you call you call this test leak test check method which takes your testing key which is how it's going to report errors and then it returns a function and so this this syntax here is a little weird looking this is so this is calling leak test check that happens at the beginning of the test and then we test about check returns a function and then that function gets called at the end of the test via the defer statement so this is a little syntactic gimmick to to make it concisely possible to run run something at the at the beginning and end of your of your test and we do this we do this in a number of places for for test cleanup it's it's a nice nice pattern for things where you want to set some some global context and then make sure that you don't you don't miss it miss it later but it does it does look very strange the first time you see that you see the syntax so as I said go routines are really just just cheap threads which means you have all the same concerns around and around concurrency and synchronization and locking and so one of the one things that one of the things that go does well is in encouraging you to use channels instead of instead of mutexes channels help you minimize reliance on shared state by sending sending messages back and forth instead of actually writing through the same the same state on both both sides of the of the channel but we found that not all usage patterns are very are very conducive to the use of channels in particular channels work really well when you can model your system as a kind of a cyclic graph of go routines that work on it work on a piece of data and then hand it off to something else if the if your if your workflow if the workflow of your system has cycles in it that you can't really get rid of if you have to go routines that need to communicate with each other in in both directions then that then you can run into a case where you can get deadlocks with even with even with channels and so we we actually ran into a case of this in in cockroach so this this has to do with our implementation of raft the distributed consensus protocol that we use to maintain consistency across across all the replicas of your data originally we implemented raft in such a way that that you know we've tried we tried to drink the Go kool-aid and use only channels there were no no mutexes no no shared shared data being passed around it was just immutable objects being passed back and forth across channels which is great right except we you know we're missing something here which is that a singleton go routine that operates on a select loop and reads and writes from different channels that's that's a mutually exclusive resource so that is a kind of a kind of mutex it can only be doing one one thing at a time and just like with just like with mutexes if you as soon as you have more than one mutex in your system that may have to be acquired at the same time you have to consider the order in which you acquired them to avoid deadlocks and this is something that we found to be to be much more difficult to manage with channels than with that then with with the traditional mutexes so in the original version of this code all our channels were buffered which actually which masks the problem because most of the time the the buffers would have room and you could just send yet you know send something in the buffer and it would sit and sit in the buffer while the while the other go routine was doing its work but this was this was kind of a that this was a this was a false sense of safety because it meant that our tests would our tests would pass because the tests never did enough never did enough work to fill up the the channel buffers but as soon as we try to run it on a on a larger scale the the channel buffers would would fill up and suddenly we'd have a deadlock and we found that that there was not really any reasonable upper bound we could put on the on the size of this channel we you know we could increase it up to some absurd number but then that that means that the channel would consume this memory whether it needed it or not for a while we tried using using slices and sending slices over the channel instead of instead of individual objects and this that this is a way of effectively making a dynamically sized channel that is bounded only by the only by the size of available memory and that that work that solved solved our deadlock problem but it only but at the expense of creating this this problem looming in the future where something could we were afraid of something going wrong and and blowing up blowing up memory usage by by filling up this this unbounded unbounded channel and so eventually we ended up refactoring all of this code to removed away from using from using channels when back to using mutexes and callbacks and and shared state because even though even though dealing with mutexes and and controlling lock ordering to avoid deadlocks is is a very difficult problem it's a problem that we we at least know how to talk about and know how to know how to approach and solve and so and then this this is kind of kind of turned out to be a contagious decision because as long as you as long as you are careful about ordering it's reasonable to to compose mutexes and to combine mutexes and say you know I have to have to get mutex a before mutex B and so on if you have if you have channels then it's it's difficult to even talk about the talk about the relationships between things we used to have these big big block comments that say that that this mutex should never be held from from such-and-such goroutine and things like that and it was something that was just really kind of unwieldy to get around and enforce and so actually we've over time we've probably been moving more away from channels and towards mutexes than then the other way around we still we you know we have no intention of you know completely moving away from channels because when they when they do fit the problem they're great and they're they're very you know and they're kind of indispensable and go because of their unique usage in the Select statement which is pretty pretty important but they're that they're not quite a panacea for for concurrency issues and so sometimes sometimes going back to two mutexes is the is the way to go so up next we we use a lot of well we use one big big c library in in our system which is rocks DB rock CB is a is a local key value store it's a fork of level DB created by Facebook you may have heard of heard about this at more callaghan's talk earlier today this is that this this is how all of the lowest level storage in in in cockroach DB is managed and the the tricky thing about this for us is that rocks DB is a is of course a sea library it's not something that it's not something that you are likely to find pre-installed on your system like like Lib C or open SSL or anything like that so you can't that you can't rely on on someone who wants to build cockroach you be having that pre-installed so we need to wait we need to ship the tooling to to get this get this to the to the user when when they want to build build cockroach from source and so originally what we had was a makefile that that essentially wrapped wrapped up our build process and so you know this meant that you couldn't use go get on cockroach you would have to clone the repo and then CD into it and run make you'd have to learn learn our build process instead of instead of working with it the way you would any other any other go tool and what we found that there were a lot of Auto compatibility problems with with various other go tools in particular guru or at the time Oracle didn't support the the way in which we were in which we were building our seed dependencies and it was tricky - it was tricky to version things we ended up having to you know just manually make clean all the time because there were times when the when the when makes dependency checking would just get confused and not not rebuild things that that needed to be built and so we we hit upon what what we think we may have been the first or among the first to do something like this we at least I'm not sure if there was anyone else doing this first but we we came up with an idea to to build they would let us build everything including our non non go dependencies in MgO just using using c go directly so we what we do is we make these make these little wrapper packages for all of our all of our C dependencies and so we have a we have a repo that is cockroach DB /c rocks DB and in this if you look in this directory then it has a bunch of CC files and these are all of the source files of rocks DB flattened out into a single directory because rocks DB or C go doesn't understand how to look into look into sub directories and then there's one one C go control control file this is this is a go source file that doesn't actually contain any go code it just has C go directives which are roughly equivalent to the output of dot slash configure so to give you an example of one of those this is this is for for snappy which is a compression library that we use I'm showing this one instead of rocks DB because this one will actually fit on the screen kind of but this is that this is a this is what what you need to know if you're going to compile a pile snapping so if you ran if your aunt dot slash configure in unsnap ease in sample ease build scripts then it would it would give you things like this so it would detect that our that our C compiler is support C++ 11 and so it would set the flag to or sorry I would tell the compiler that this code is using C++ 11 and it would set up some other other defines and and include paths and things like that and so this is uh and so in this in this go file the only thing the only actual line of code besides the text declaration is the import C and then we don't actually don't actually use this anywhere this and then when you do this C go when you build this package will just compile every every CC or C or M or whatever now now dot whatever Fortran uses it'll just compile everything in that directory link it up into a package that you can then depend on in your other other go code and so this is this is really nice this is fixed the problems we were seeing with with go oracle or guru it means that that it's now actually possible to to build and install cockroach DB with gogit although I still wouldn't wouldn't really recommend that because you you don't get the benefits of our of our dependency pinning and so it would just it would just build the master versions of everything so I would still recommend if you want to build cockroach the recommended way is still to to check out the source and run and run make because that'll use the right the right pinned pin dependency versions but if you if you want to build it with go get you you certainly can and then once you have gotten the right versions of everything then you can you can just continue to use go build and all of the standard build tools from there the one drawback to this to this approach is that it is that it bakes in some assumptions about the compiler and environment that you're using and so the builds produced in this way are potentially a little or that the rapper packages are a little less portable than the underlying si packages would be for example you don't see it in the in the snappy example but in rocks DB we have we have a few command line flags that a few compiler directors directives that differ between GCC and clang and we actually right now we manage that by having a it and that there's no there's no build tag that you can use from go that allows you to distinguish what what compiler Segoe is going to use and so instead we we kind of fake it by looking at the platform tag we say if it's on Linux it must be GCC if it's on Darwin it must be clang and then we you know we have but we don't we don't have full coverage for the for the BSD variants but we have we have a couple of those in there in there as well but so this is this is something that it's a little a little less than less than ideal here it means that you're kind of managing managing other people's build processes by by hand but for the things that we that we use it for it's it's worked out worked out pretty well for us another another note about about Segoe is it's we've learned some lessons about about performance and see go see is is not surprisingly normally faster than go so once you're once you're once you're in C you don't have to deal with the with the garbage collector and it's it's usually usually faster because of that but crossing the boundary between between C and go has a cost it's not not a huge cost it's we actually measured it at 200 nanoseconds and that sounds that sounds really small but it turns out that the modern CPU can do quite a bit in two hundred nanoseconds and so we found that that we were spending 200 nanoseconds of Sego overhead to make a function call in to see that would that would only take 50 nanoseconds and so we were able to make make pretty sizable performance gains by moving that work from from C into into go and this this refactoring actually led to more work being done overall because we ended up batching things up we ended up doing more work on the go side to to build up a batch of work to pass over to C++ so that we could amortize the the cost of cost of of context switching like that and and then the the C++ side has to unpack that batch and process it again and so that there's some some wasted work here but because it avoids the the overhead of crossing the boundary it turned out to be worthwhile and this is that this overhead in case you're in case you're curious mainly comes from the fact that NEC go call needs to interact with the go routine scheduler because the bit because the go runtime needs to know when a wind system calls are made and it just assumes that NEC go call may make may make system calls without warning and so if we could somehow mark a a particular Sego function call as as you know if we could guarantee that it wouldn't wouldn't make any system calls then that could be a you know a way for us to cut down on the on the cost of of going back and forth across the boundary and then we could go back to doing things doing this this particular function call in in C++ again finally one of one of the things that is probably the one of the first things that people always learn about go is something that about anyone anyone in this room whether you've written a line of go or not you probably know that that go is good at producing static binaries you don't need to ship up you know runtime library along along with it it's just one executable and and you're good to go it turns out that's that's not entirely true it's mostly true in the problem is that on on Linux in G Lib C that there are certain features that effectively require dynamic linking you can and in fact if you you can try to force the linker to link link gb statically and G Lib C will will basically fight against you and at runtime it will dynamically open another copy of itself because it really wants to use dynamic linking for for these features for whatever reason and I'm not sure of the entire history here this seems like kind of a kind of a misguided feature from from my perspective I'm sure there were good good historical reasons for it at the time but this includes things that thinks that you don't really think of as being a GCC Lipsy feature like some of the options you can set in Etsy resolve Kampf and so this means that and so if you so one of the things we tried when we we saw that we ran ldd on our binary saw that was dynamically linking GGC and we saw that it was you know we saw that was that this was dynamically linked we wanted to get rid of that and then we found that the G Lib C had this property so like okay well we'll use a different Lib C we tried we tried with a library called muscle mu SL which is an alternative Lib C implementation and that that library has no problems with being statically linked and so we so we were able to to build that bit build that static binary and ship it and then the first person who tried it said that it's not working for me and it's DNS resolution is failing because it's not recognizing this this option that I have in my in my resolve comp and so so you know then we then we wrap all of that repeal that back out and now we're back to just building building binaries in the in the default way that are dynamically linked to two G Lib C and mostly this this works just fine until until pretty recently nearly all Linux system use G Lib C as their live C implementation and so and jus FC has been has been very good about backwards compatibility for many years and so it was it was perfectly safe to take a binary built on built on one machine and and ship it off to another machine and it would just work in fact this is how the go compiler itself is distributed it links dynamically against G Lib C and it's on Linux and it will it'll work on the vast majority of Linux systems out there the one catch to that is that recently with the with the rise of docker people are looking into kind of micro distributions of Linux that don't have all of the complexity of of the more traditional distributions and so they want just the minimal runtime and so sometimes you'll see a distribution in particular alpine linux is popular with with docker images and that that distribution uses uses muscle as its Lipsy it doesn't have G Lib C present at all and so you can't take a you can't take our pre-built cockroach VB binary or the or the go compiler for that matter and run it on on Alpine Linux you can build you can build both of those things from source on an Alpine Linux system and it will work but you can't you can't use the the pre published binaries there and so this is that's just a just a trade-off between making a binary that that truly has no dependencies versus one that will work the way that the that the system administrator will expect and for now at least that you know we think that points in the direction of just using using G Lipsy dynamically and yet you know it and then if someone wants to run it on it on an Alpine distribution then that then they'll just have to have to build it build it separately this is this is something that I think is kind of currently in a little bit of flux right now so you know but for now I would say don't it would be kind of a waste of your time to try and try and build completely static completely static binaries that don't depend on on G Lib C but you know who knows maybe in the future that Ola that'll change but yet you know another thing we've we've come across that also implies a dependence between the between the binary and its environment is actually the time zone database this is something that that you don't really think about I kind of suspect this is a problem in a lot of in a lot of applications run in docker containers but you may not realize this but every that there's a database of timezone information around the world that includes things like when daylight savings time changes happen in in every country and every every time zone around the world and you may not expect this data to change very often but it turned it turns out it does there are you know 8 to 10 updates a year of this of this time zone database and and so if you if your application deals with deals with time and as a sequel database we end up having to deal with just about everything including time we have we have functions for converting between between time zones and so we we're I on having a an up to date up to date zone info file around and so if you so that this is another thing that you have to think about if you're if you're going to run in one of these super minimal docker environments because the lot of times the zone info database isn't isn't present or isn't kept up to date but that's something that may need to be mapped into the mapped into your docker container from the host system or or soar through some other means updated and so so yeah so anyway that's that this this concludes the the lessons we've learned in in developing developing cockroach in go now I just want to give you a brief summary of cockroach as a project where we currently stand you know if you if you've gotten interested in it and you want to go you wanna go check it out we're currently in beta it's definitely not not production ready we will we will try not to eat your data if you if you use it but at this point we're we're not quite ready to make any make any guarantees but if you want to start start developing and testing with it it's it's ready to ready to go we it's as I said at the beginning of the talk it's a sequel database we implement the Postgres network protocol and so you can use it from any from any language or environment that has a Postgres client driver which is basically everything and so it's pretty easy to get started with in particular also just yesterday we released a an integration package that makes it met makes it possible to use use cockroach GTV with the sequel alchemy ORM in Python so if you want to if you want kind of a more full stack environment to play with that's something you can you can check out our near-term roadmap for a for cockroach DB is we're working on primarily on stability and performance right now and that's going to continue until it gets to the you know gets to the level of quality that we that we want it to be and we're also working on distributed sequel processing and and joins joins are the probably the biggest sequel feature that we that we don't support that we know is important and is is coming soon so if you want to if you wanna get in touch with us we do we do all our development on on github we we have a getter channel where we where we can communicate in real time we also have a mailing list which I didn't add to add to this slide but it's it's there on the website cockroach labs comm Twitter cockroach TV and and thank you very much Noby have you I take questions this one your programming language team at Google now it's on this is a really great talk and I really appreciate the detailed explanation of the issues you run into and what you've done to fix that I took a bunch of notes because this is roughly a to-do list for people on to you I just had a few specific comments I don't actually have any questions so if anyone has actual questions please answer first and I have just some comments on some of the things you brought up in your slides all right a few notes about sort of patterns so you mentioned using synced up pool and you know the allocating buffer and releasing one small IDM that even helps make that a little slightly safer is instead of using buffed up release to release pointer you still got a non nil pointer there if you change it to a function that you say release in the ampersand buff then you nill it out and so that gives you a little bit stronger here and easy don't worry UW's after free bug so we do use this for example in our RPC client within google mmm you mentioned the leak test and setting it up in each test there are some features coming out and go one sevens testing package that make extracting comments setup and teardown for tests quite a bit simpler so you can actually generate test cases on the fly within your test and so then you come and set up and turn out just do not loop so that may simplify your testing a little further yeah we actually have a we have a script to automatically make sure that all of our tests have that have that line in in them sure yeah so some of those features just help in creating your table driven data driven test yeah make a little a cleaner and get make the results making a little more robust as well and finally the I think the most interesting bit for me was the issues with channels versus mutexes we certainly encountered some of the same issues in building production software within Google would go and I think our rule of thumb now is if all you need is a mutex just synchronize access to shared State use a mutex but as soon as you start reaching for a condition variable you should really be thinking about selects and champions because it's usually a much cleaner way you'll find yourself RER kicking yourself or in a way that you know your conditions you now think about selecting on Levin and that turns out to be usually easier to reason about and work with but simple synchronization I think matrix is a great thing to use the one downside of using a pure mutex is that as you said you can't select which means you can't decide to stop waiting for the mutex to become available so if it really is a contended thing then there's no way to say nevermind I don't want that anymore there again you probably want to select yep yeah and I'll talk to you offline about some new stuff we're working on that might be interesting great thanks a lot hi my name is Daniel can you make some comments about v4 and package management I'm about what and package mean the like which which implementation you're using if you're using a for example G DAP how do you keep control about presentative versions that's you determine yes so so for dependency management we use it we use a tool called block or G lock and it's it's a very very simple simple system it just it just does the does the git checkout in all of your dependencies to get a to get a pinned version it's it's a little little behind the times now because it doesn't support the the new vendor mode that went in with with go one five and so it it assumes that it can basically take over your entire go path and change versions of anything you have checked out that the cockroach DV depends on and so it's it's not ideal we're looking to looking to move away from that and move to a tool that that has been updated to to support the the vendor directory but we haven't we haven't settled on one yet because Glock has a feature that that we really like that we haven't seen anything in anything else which is the ability to depend on a command line tool and have that be automatically installed into your Govind directory and this is actually what we use for all of our of our linters and tools like that that we use in our build process and we haven't seen we haven't seen a comparable feature in any of the other other dependency management tools and so that that's why we haven't haven't switched something more modern yet see have you faced any pitfall with using d4 for example and a code like this with your task for example you you you defer checking like the routines for zipper to the channel have you faced any problem with that I'm not sure what sort of problem you you defer closing for example in China or you closing some HTTP connection and you basically you're not checking if that is closed because you just trust that that's going to be closed at certain point yeah we we haven't run into any problems with with defer we do what we like to use especially with mutexes we like to defer our unlock statement to make sure that it will are unlocked call to make sure that will always happen we've we've not run into any any problems where that where that doesn't really doesn't really work although sometimes we have to introduce new new function scopes in order to to get the defer to run at the time we want because what one thing one mistake that we made a number of times early on in development when we were still still new to go is that we would put a defer call in a in a loop and that that doesn't run at the end of the loop it runs at the end of the function and so that was that that was a mistake that we that we made made early on but no we haven't had any any problems with it other than that right so just a quick question if you were to rewrite cockroach TV today would you do it and go that's a kind of a loaded loaded question I as I said at the beginning I'm yet you know I'm not not able to answer that very confidently I think I think we would definitely give give rust a serious look if we were if we were starting over today I don't I I don't I don't know what which way the decision would would go I think that I'm in my own experience with that with rust it it's been wasted my the very little amount of rust code that I've written has been Barrett has involved a lot of painful compiler fighting and so I'm not sure whether that would be a that would be a net win although it does it does have a lot of really nice properties in terms of insuring memory safety and concurrency safety that I think would be would be very valuable and maybe maybe worth wrestling with the compiler a little more so I don't know I just wanted to ask so you said you've got scalability as one of your things what kind of testing have you done with go and how have you seen how your what testing you've done for scaling and how go has handled that so yeah so in terms of in terms of scaling I mean we we start up multiple multiple nodes on either in process I mean we have multiple levels of testing so we have our unit tests that start up start up a number of nodes just in process right is there more integration tests than unit test but anyway we have our in process tests that you start up multiple things and then we have another level of testing that uses uses docker to start up a bunch of a bunch of nodes in in separate processes with a little more isolation locally and then larger larger tests on AWS and and Google cloud services and yeah I mean I don't really have anything to say that's specifically about about go there it just it you know it just works you know the problems that we're having there are of our own making they're not you know there's nothing that we can really point to about about go in terms of scaling up in that regard hi I was I noticed that in the talk you had mentioned how how long it took to cross the boundary between C and go and then you sort of made a decision to move more code into into the go side and her to make up for that I the other gentleman's question actually mentioned the defer and you said that there was a measurable difference and I was wondering what what order that difference was on and sort of why you know how it made sense for you to sort of go outside that idiom and it's a good question I don't have I don't have these statistics in front of me it was a the cost of a defer is definitely smaller than the cost of a C go call I don't I don't remember what the what the detail is I know that if you're calling if you're calling defer in a loop then that that involves and potentially involves an allocation and so that has that as allocator and garbage collection costs defer in a single defer in a function call I don't know of any good reason why that would need to it would need to have any significant cost I think the presence of a defer turns off certain compiler optimizations because it needs to be able to to change the the branching pattern to make sure that you always get back to the to the deferred statement as you as you unwind the stack but I I don't I don't know the details it was it was it was enough to show up in this in this benchmark but it was yet you know we gained we gained twenty almost twenty percent on on one of our benchmarks with the by getting rid of the unnecessary Segoe calls this that this would have been like a 1 percent improvement but we did it because it was just a one-line change generally the rule of thumb is I wouldn't worry about the furnace it shows up on your profile and then it's pretty easy to then just put it wherever it needs to be called so there there is a small cost as you say you know because it needs to run on certain code fascinating it needs in particular if there's a defer with a recover from a panic it needs to be you know put Nate appropriate places for the stock unwinding but I can't speak to specific implementations that I can certainly put you in touch with the right people but run the run those CP or profiler and that'll tell you if it's showing up as an issue yeah yeah we we have we have a few a handful of places in the code where we do yet you know explicit we do specifically avoid the use of defer yet you know we saw one of them there there's only you know three or four more of those in the codebase everywhere else we do we do use defer and we we but we don't we don't ever see any you know it's just not measurable at any any higher level code just curious have you guys ever thought about kind of trying to apply the Jepson test to your system see what comes out yes we have and we have a we have a blog post about that so if you go to go to cockroach labs calm click on the blog there's but there's a post from from a couple of months ago about the testing that we've done and it's difficult to get you know we found but we found some issues that are that have since been fixed yet you know this is not you know this doesn't mean that everything everything passes that everything's good because of course you can't you know you can't really prove the absence of bugs with with a test like this but the results of our of our Jepson testing so far have been have been encouraging thank you thank you so we have reached the end of the conference i would like very much to thank you all for coming out and supporting us in this endeavor we are planning to the conference again next year so i hope i'll have the opportunity to see you then and i just wanted to close up by thanking our conference sponsors without whom this would not be possible so those are Google out Nexus the Kay Family Foundation Facebook and back-trace dotto so have a good evening in New York a good good rest of the week and we'll hope to see you next year thank you you
Info
Channel: Association for Computing Machinery (ACM)
Views: 35,851
Rating: undefined out of 5
Keywords:
Id: hWNwI5q01gI
Channel Id: undefined
Length: 61min 45sec (3705 seconds)
Published: Wed Jun 22 2016
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.