Lessons Learned from Packaging 10,000+ C++ Projects - Bret Brown & Daniel Ruoso - CppCon 2021

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everybody welcome to our talk we're going to talk about packaging in c plus i'm brett brown this is daniel russo we are from bloomberg engineering we're in the developer experience department a large part of our jobs is managing a heck of a lot of c plus plus code um so why are we giving this talk why are we here so this this is a tweet from yesterday by matt godbolt and he wanted to do some kind of poll for his talk and i don't even know how he's going to use this poll but it happens to work in my talk rather well so i borrowed it and uh that he pulled on some things what are the which parts of c plus plus are the worst and the plurality was clearly package management so we would like to be a part of the solution hopefully by starting a conversation and providing maybe some places to focus on right so as i said we do need to converge more in how our packaging works but we think there's a lot of common ground and this that's what we're going to talk about in this talk is where we think there's some common ground to start on why do you care about bloomberg packaging like well you all have maybe you know bloomberg has special extra cool packaging but what is how does it affect me and the answer is it should look a lot like packaging you are already exposed to and what we do looks a lot like linux distributions and i'll get into more detail on exactly what why that is so the first section of the talk i'm going to be talking about where how bloomberg does its packaging right but first i'm going to talk about how we don't do our packaging we do not have a mono repo we cannot assume that all the source code can be committed with the same pull requests or the same you know vcs operations to control atomic changes we do not have a unified build system we cannot assume that the build system can tell different parts of the code how they work together we do not have a ci monoculture not every project has to have the exact same configure continuous integration setup we do not have a project structure monoculture we do not require every project to have the same source directory include directory test directory layout we do not have a single unified release process there are several ways to get the code you have into the production machines where they need where it needs to run what our packaging does look like it looks like a mixture of third-party and first-party projects we take a lot of code directly from the open source maintainers as released and zipped up releases and we take those and build them as packages and by and that's exactly what we do we don't what you call vendor and vendoring is where you take the source code from an upstream project and you copy it into your version control system and make that part of your version control system and then build it with whatever build system you have we in fact just build this source that was given to us as is in a sandbox take the artifacts out of those builds and then use those in subsequent builds right the other projects that depend on those we use exactly w and package management we have some flavors on top of that we're going to get into a little bit of that in this talk but what we do is not even a new package management system we like to use open source tools as much as possible if you are wanting to install one of our library packages into a virtual machine or into a container you could literally use the app to get commands you may be familiar with from distributions like debian or ubuntu we have what you would call a partial distribution meaning we like to use the operating system packages as is these are packages that are fairly low level like lib elf packages like perhaps your your compression library might be abi stable right open ssl might be provided by your os vendor right we don't touch those but we do provide everything on top as much as possible under its own prefix like we use opt bb for bloomberg that's where we would install all of our libraries all of our developer tools all kinds of other things as well we even have builds of things like neovib that we install under optbb our package repositories are curated what do i mean by that it means that each change set is explicitly promoted it means um we don't have a magic tool that goes around and tries to pull out different versions of source files and see how they can get the whole thing compatible we take a baseline we submit a set of changes change projects to that baseline they get validated and they land daniel will go into more detail on what that looks like later more on what our packaging looks like we have a majority of cmake just because people like cmake at bloomberg it's been really really well adopted at bloomberg lately but we also have a lot of other build systems like i said we don't have a build system monoculture we have canoe we have build frameworks based in gnu make we have scones we've had roth in the past there's all kinds of different build tools it's worth noting even within cmake there's a lot of variation different projects can and do have different workflows that need to be supported in different ways we use package config for inner project library metadata what does that mean well gaby was just talking about this in a talk about modules but when you want to build against a library you need to know certain things about that library so that you can build against it right and that's in the old world book 3 modules it's going to be changed and be the same in the new world with modules right you're going to need to know pre-processor definitions you're going to need to know where its headers are located you're going to need to know are there any special other flags that you need to have in order to understand the source code the interfaces that are provided with that library right so we use package config for that because again it's open source we like to use open source as much as possible it's not perfect but it actually gets the job done pretty well we would like to see this iterated on as time goes on in the future we mostly use pre-built libraries we don't generally build dependencies on the flight as part to incorporate a dependency into your project usually you use it as pre-built devs the dev uh developer workflow would be something you would clone the project that you want to work on you would code and then you would publish it and then the people that depend on you can consume the pre-built binaries we use static linking whenever possible to make the release process simpler make it easier for your release process to be portable between different release processes right but we still support dynamic linking because there are libraries that just don't work well static with static linking including things like encryption libraries that need to be a hot patch to to support to fix security bugs right so it's worth noting that whatever packaging system we come up with it's going to have to support static linking really well and it's going to have to support dynamic linking right how does how do developers work we actually have a pretty flexible developer experience a lot of things are off the shelf and peop individual engineers can decide what they want to use to do the current job they're working on and to pick their favorite you know we don't we're not in team emacs we're on team you know we're not on team them we're not in team vs code it's the engineer gets to decide what makes sense for that job um this is worth bringing up because a lot of the pain points engineers have with doing local development in sdlc really boils down to packaging headaches so it's really important as a community that we come together on some packaging standards so that when people say why can't i find header file xyz we can say well that looks like a packaging problem did you do the standard thing to depend on xyz as a package right ci enrollment is decided per project similar to our build systems like you can pick the ci system that makes sense for the job you're trying to work on right we have production package builds or abi coherence we have an integration integrated build system that will evaluate the entire code base with the change in the as an entire code base and not just as that one change including aggressive dependency rebuilds to make sure that we're abi coherent a lot of changes are subtly abi breaking or they would violate the odr have an odr violation or change how parameter passing works through in registers and so it's very important to us that we don't assume anything that we actually validate that the build works as intended right and again daniel will go into more detail on exactly what this looks like but we do have transactional changes to prevent broken releases you can release in a lot of cases an individual project and say i upgraded fmt right but there's times when you do need to change several projects together and that needs to be an atomic change so whatever packaging system we come up with as a community needs to be able to do likewise and be able to model we're gonna these this set of things goes together if you take one without the others you actually have something that's wrong right well we'll take some questions at the end if you don't mind we have slide numbers if that's helpful um and now i'm going to hand it off to daniel he's going to talk about some requirements we've learned from our experiences thank you so uh i think it's important to to note that we selected three main themes that we think are the most significant on this space but i think we could probably spend a whole day here like moaning about how hard package management and build systems are so we're going to focus on three main themes that we think have the most impact on package management in general the first one is the idea that a package management system needs to be built system agnostic if the package management intrudes into the build system we will end up having lots of problems and hopefully i'm going to work through and convince you all on that on that case so the first aspect of this requirement is third-party packages and we're going to do this a few times where we tell this little story to try and contextualize how this conversations happen in our engineering organizations so we're going to start with the c plus plus coder that says hey i want to use boost then the build person turns around and say sure just epic install it then the coder turns around and say well can we upgrade ubuntu and like what because the version in lts is too old like i'm sure many of you have experienced something similar to this it's like a very common uh problem that that people go through and this is like a common trajectory right like you start using the system packages because you want to be a c plus plus developer you don't want to care about openssl you don't want to care about zlip you don't want to care about any of that so you just go and say like apt-get style libs elite dev and lib boost dev and all of that and you're happy right but at some point you're gonna say like oh maybe i need to start back porting boost because like i don't upgrade the version of the operating system yet but i can't be stuck on the old version of boost and and it slowly evolves to the point where you're it's like suddenly realizing that you're almost building your own custom linux distribution at which point you're really considering changing careers because it's not fun right there's some other trajectories that i have seen also what if you have a large shell script that builds all your third-party dependencies this is actually something we had at bloomberg a long time ago where it was just like some arcade collection of shell scripts and make files that three people knew how to change and how to rebuild uh i i don't think i need to explain why that's bad you also have the option of like rewriting the build system from those third-party packages inside of your particular build system and this seems to be like a very attractive solution uh in the short term but as you need to integrate more and more third party this becomes more and more of a bottleneck and the amount of engineering effort you need to keep like every time that there is an update and now like the build doesn't work anymore uh so this is like something that we have experienced for instance trying to consume basil projects from the open source world where like things that are outdated or that don't match the environment exactly and then it's just a monumental effort to make that thing build again and of course there is the naive option of just save a tarball and hope for the best which i'm sure no one has ever done so the second case for why it needs to be built system agnostic is being future proof and again let's go with the little story so the build person comes around and say hey i need you to change your ci for project x because we need to update the build system then the c plus developer turns around and say like yeah sure i may be able to do it next quarter cool then i'll disable the ci then what team y will be blocked otherwise and the important thing here is that the build system is closely tied through the workflow the way that you run tests the way that you clone your repo the way that you invoke your build system the way that it finds the dependencies it's all very tightly coupled together and changing the workflow is hard the hardest part of changing the build system is not actually rewriting your semi-lista text the hardest part of the builds the changing the build system is changing everything else that happens to know about how cmake was invoked and changing everyone's workflow at once is even harder and even though we don't have a monorepo at bloomberg we have a wide gradient of repository sizes i i don't like people that call them mono repos because they're if it's mono it has to be one uh so i call them manatipos because they're just like very large repos and we see that the teams that have very large repos they have a much harder time adapting to like changes in their build workflow than the teams that have smaller repos because you can adapt the smaller repositories as you're working through them without having to like back go through the backlog of the entire rest of the repository to do it and so if you need the new feature of the build system that depends on the change on how you do the build you can do that localized in that smaller repo and over time do the migration of every other part of your of your build system and and in our experience that that proved to be more effective at changing things excuse me our builds today look nothing like they did five years ago uh i think even even if we were like using cmake as much as we did five years ago uh as much as do now five years ago like it would be an entirely different flavor of cmake because five years ago it was when like people were talking about modern c making like targets and target properties and so the build systems that started migrating to cmake five years ago at bloomberg look nothing like the cmake repos that we have today and that should be okay and in the future it's going to be different again modules is going to change how build systems work and we need to be able to change the workflow one project at a time because we just can't afford to pause the world to make all the changes that are needed in every single project all right so here's the third little story on build system independence c plus plus coder and another team comes around and say hey our build is broken after the change you made we didn't change any public interface well header x is no longer viable visible well header header x was never intended to be visible well it was an arc in our include path so we used it and i'm sure no one has ever seen that so and it's a basic fact of life right like whatever is visible becomes an interface that's true of your code that's true of your apis that's true of your build system that's true of like every single aspect of the software development like cycle like the layout of your source code what libraries are in the same git repository what build system you use so like i think with these three points uh i i want to make the point that we want we need the package management to support multiple build systems and what does that mean it means that we need a hermetic build environment that has like a source package as an input it has build agnostic entry points and to use the example of debian the debian rules file serves as this build agnostic entry point and then the source package produces a binary package which is then used to as input to the other builds this will form a direct acyclic graph and perhaps the most crucial point is files on disk are the interface on how projects communicate with each other and the package management's role is to answer which files you have in your system and i had a little asterisk on the acyclic part because it's not strictly necessary but you really really should have a very good reason to build a cyclic graph on your packages because it will be a pain in the ass so how does this look like so you start with a source package let's say waves and then the source package produces two binary packages let's say libnoise-dev and libsign-dev then you have another source package called sounds that declares a build dependency on libsign-dev what that means is the build orchestration system will create a hermetic environment and make available for the build of sounds the contents of libsign dash dev so that build can run then livescience that the sounds source package produces libtone dash dev and and then we say that lib tondashtev depends on libseyn dashdev because for instance a header exposed by lipton dashdev includes a header from lebsen dashdev and so you can only use libton.dev if the files from lib sign-dev are also available and then you can have a third package that build depends on both things which produces even more binary packages and so what that means is that you can you can see how the hermetic build system starts from like scratch essentially you have what is your build baseline which for us is the red hat enterprise linux libraries that are declared as api compatibility level one which is the one that they guarantee to be api compatible for three major versions and then everything else that we depend on we build on top of that source and we just build this from scratch every source up and each one of those sources could be using a different build system and it's all fine like you could have this wave source package building in cmake and the sound source package being written in scons but because the interface between those is files on disk nobody needs to know how each source package gets built it's not part of the interface of that source package how that package was built it's literally just those are the files on disk that's my interface and i gotta hand it back to brett to talk about names right so easy problems like how to name things right um so here's here's another dialog right so um we have someone a downstream engineer talking to someone that provides a library and says i'm trying i'm trying to build against you but the build fails i get can't find big path to core.h and then the library maintainer says well if you read the docs you would know that you were supposed to have a dash i flag that said user include depth department team and then you actually should be hash including fubar slash core.h not that whole thing you're including like okay i guess i can fix that thanks um but i think i'd rather use modern cmake and i know from all the talks at cppcon that you're not supposed to put magic include patents in your cmakelist.txt okay fine fine so you still include foodbot.coorditex like i said before but you would call find package on jurassic bark and this the target you would depend on is called seymour colon colon static clearly that was a silly question i'm sorry i asked in the first place is everyone following along with all the names and how they map to each other yeah the point is you shouldn't be able to because this is not a good example right but this is like i'm exaggerating but we have this problem now i'll i'll call daniel out on this earlier he said you would depend on lib zlib-dev and that's actually not right you would depend on lib z-dev that's the point it's like someone that's been doing this for so many years could easily type to get the wrong name why because it's a special name and we got we got to talk about that right bespoke names need explaining so again just to reiterate the way you write your hash include statement is part of your api everybody needs to write that the same way so that we don't include accidentally a different header that might be slightly different or might be entirely incompatible right your cmake integrations are part of your api the names all need to make sense together your compiler flags your linker flags they're part of the api on how you consume that library we're used to in these conferences talking about only what goes inside the dot x files inside the dot cpp files or however you like your extensions but these we're pointing out that at the tooling level your api includes other things that really affects other pieces of code some of those code some of that codes written in languages other than c plus like debian rules as written in make or your build system and things like that right one more example just to elaborate a little bit more same people just later in the day okay well i did what you said and it said it can't find cmake said something about can't find jurassic park config.cmake i don't know what that means um well that's because you didn't include the omicron per ci 8 package which obviously provides that uh c make list file the c make helper and what's that it's like well that's that that's the library that ships lib foo bar that's the package that lib ships loot libra obviously and you know they have they have a nice drink afterwards to work things out um so the point being the packaging the name even the name of your package is part of your api because it gets written in other people's projects and then if you change it or if you pick a bad name that gets frozen in somebody else's project and that's a breaking change right um so have so what we we propose is that you should have a clear translation for a name between all these different names you use if you're using foo have package names that are based on foo using simple rules like adding a prefix and a suffix like lib and dev if you depend on foo and your build system you should have something pretty close to foo being described as your build time dependency if you're using package metadata which we would propo which we advocate and a lot of people do use it so it needs to have a de facto spec if not something official it's likewise the name of your package config metadata should be discoverable under the name foo right and it's worth noting that the clear translations for a name are important for us because we get irritated and we would like to be better co-workers and better colleagues and collaborators but it's also important for these tools if you have a it may be if it's intuitive to a human and not intuitive to something that could generate that pc file you're gonna have a worse developer experience for everybody so we need to have basically algorithmic ways to take a name and use it in different contexts right so like consuming or generating package config are good examples the good news though is that the library name is a really good starting point a major it seems like the open source community naturally does a good job of keeping people out of each other's names it's not perfect but it's pretty good for example you don't seem to see another project called google test that is also a testing framework but isn't the same google test project that doesn't happen it doesn't get adopted people like why would you ever do that like it doesn't work like we don't have those problems so if we do a good job of picking the names that naturally self-police i think we'll find that we'll have a good job we'll have a good job in keeping things from being confusing for people and having conflicts right um some registration mechanism helps i'm i would advocate for that if we can get it done but i'm not sure we have to wait hold the phone until we can get something like that working being able to reference by a simple token helps again we talked about mapping names if it's a simple token it's going to be easier to do those mappings but whatever we come up with if there's a separator it needs to be portable to many situations right so like if the separator is different for files in the file system might use slashes to denote directories but your packaging system might use dashes and slashes are illegal but some yet some other metadata format might only use underscores we need to be able to map into those different situations when there are when they're when there's separators between the parts of your name right important when we have names is to make sure that we have we we can find the name conflicts when they happen before things really blow up in people's faces in production right uh so what we ended we found works really well for us is we use file conflicts to detect the interface conflicts basically we can look at the contents of a package and say does any other package have corefubar.ex and then if we say that if we see they do then we can say oh that's a conflict you can't have both of those at the same time at the packaging system level we can say never install these two things they don't work together that kind of thing right so what helps with that is a single include path when you talk about your build system so you don't have to have a lot of fancy well we want the last two segments of the path but on this project it'd be these two last segments of the path and this project would be these two last segments of the path it would always be the same kind of relative path or same kind of absolute path across all the packages um similar for libraries if you just deployed a user lib or whatever your library prefix is it's gonna be really easy to tell oh we have two lib foo bars that a that we're trying to install at the same time right it's worth noting again in modern cmake you wouldn't use a dash l hard coded you would use something like a c make prefix path but you wouldn't want a bunch of cmic prefix paths one for every library if you could avoid it ideally you just have like one coherent one or at least one coherent policy to make sure that they don't step on each other right similarly for build metadata we would use something like package config path and i guess a quick aside here about cmake names i mentioned like that's part of your api and it is related to your packaging so it would be great if we could all get on the same page about cmake like how you'd write in your cmakelist.txt what you depend on and how your dependencies work between your libraries this kind of the wild west right now in the cma community with different kinds of name spacing happen happening we actually like to use just the library name and find it works really well and that's to our point about the library name itself is pretty is pretty good um but we do see other things out there like including actual version numbers in the things you're depending on which again are not portable somebody wants to upgrade to the next major version they got to go patch however many projects mentioned the other version number right similarly discovery mechanisms should not show up in your in the files inside of your project these are the version number and the discovery mechanism are usually properties of above the build system something the packaging system would be making decisions about your packaging ecosystem would decide we're all using package config or we're all using some daemon that does look up does live lookups while you're building or whatever that is we want to keep that kind of information out of the build system so that we can have the we can have that separation of concerns so the packaging system could say oh yeah well we're using package config you don't know that because you're just saying foo but we'll make sure that we get the right you get the the package config discovered foo when the time comes so and back to daniel to talk about are the integration build system that i was hinting at earlier all right so yeah this is probably concretely the most challenging from an engineering perspective problem but uh i think it is fundamentally important uh for the success of a package management system and what uh again we'll start with a little story uh of a c plus plus developer that comes around and says version xyz of your library fails against abc or flip flow dash dev yeah we haven't migrated yet we only work with version tyu but version tyu is incompatible with libar dash dev and we in the process of like preparing this talk i was looking at other companies and what they do and we learned that amazon does something similar to what we do and in the articles that we found and with conversations we had with people at amazon they mentioned a thing called version set hell which is that you don't you end up in these situations because this is a really hard problem and so i think the point here is that statistically speaking every combination of versions you didn't test is broken and so over time you also can't assume that the library that my team wrote is definitely never going to be reused and it's fine that it sits here and we only only we care about it over time every code that exists will end up being reused in unintended ways and so the argument that i'm trying to make here is that it pays to create a head of the integration which is always healthy and has as much code as possible and so here there is an important point i think which is we are in between a repo and the full wild west many repos so we have a multi repo developer experience and workflow because you can still just operate on your local git repository but we have a monorepo in the sense that there is one place that says here's the head in the head here's all the versions of all the code and we know that all of that works correctly and you can try using it and it and it's going to be fine but it's just a baseline and it's essentially the head of a graph so essentially the history of the integration build is has a life separate from the history of each one of the independent repositories and this is important because applications need to be able to make changes in specific contexts so let's say the last time you deployed this application the head of the the integration build was two months ago and now you have a real problem that you need to patch and you need to patch it now so do you want to take the chance to like move everything else to the latest because like you didn't have that context and that contest doesn't exist anymore so i guess you just have to take ahead of everything again i mean that would be really bad so you need to be able to go back and apply your patch in that context and that allows them to release fixes without introducing unrelated changes so how does that look like that and we use the term snapshots and distributions for this so essentially you have a snapshot one where you have a collection of libraries with specific versions and then over time you're gonna upgrade one library in in that snapshot and you get the new snapshot and then you do another update and you get a different snapshot but all of those are working and coherent and we know that we tested them and we built them so we know that they are usable but at the same time someone can go back to snapshot one and create a new branch of the integration build and say you know what i have to go back but because i'm gonna operate on the version that i had released but now i need to make another sequence of changes in order to fix my code in production so i essentially end up with a new git graph well it's not a git graph but it's a new graph of versions that describe the entire context of the build the other point is that at scale it's really hard not to break the build and so what we've learned and we've learned is like very iteratively where we implemented various versions of the system that does this integration and at the end we got into a system that actually does that transactionally and what do i mean by transactional uh it means that you have that snapshot and you make a change request to move the snapshot to update the versions of one or more packages in that snapshot it will build everything on the side in the send boxes save all the artifacts if the build succeeded you commit that as the new head if the build fail you throw it away and then if true things in parallel succeed then you need to rebase the second and rebuild again and then you can come in so this was like a very important lesson for us because before that we just never had a green build anymore because when you have enough people breakages just keep seeping in and unless you're rebuilding the entire world in every pr like breakages are going to happen but you can branch integration from any known working states so the working state gets saved as a snapshot and anyone can create like a new branch of development from any one of those points and so if you need to go back and patch if you if you realize that you need the new feature that of a library that is not yet released maybe you can ask the team to make a pre-release and and you can push their version to your integration before it goes to the main integration maybe if i have a library that a lot of people use i want to release internally to my applications first build all my applications with it roll out all my applications validate that my library is actually working as expected before releasing my library to the integration build and expose everyone else to it and we can do like clever things to copy and prune the graph to release to like reduce the size of churn that happens so when i'm working on this parallel universe only with my applications i don't necessarily care about rebuilding the entire universe that depends on me so i copy the graph i prune to the subset that i care about and i only rebuild that and then when i finally merge it back to the main integration then everything gets built but i don't have to wait for the whole universe to rebuild before i release my application the other point is testing and static analysis integration build the integration gets really expensive right we have a library like bde at bloomberg that every single c plus project that bloomberg depends on so that means that when they release a new version the build takes i think what 12 hours now or something like that because like we're rebuilding the entire universe from scratch again and if we have to roll that back like that's really bad because that now we need to like essentially spend another 16 hours rebuilding everything with their version roll back because a bunch of other changes have happened in between and some common issues can be prevented with static analysis and i'm going to echo bjarna's message from earlier like implicit function prototypes are probably the most damaging thing that still persists in the in the c language today we managed not so long ago to make were equals implicit prototypes like everywhere but it was not easy and unit tests of your consumers provide significantly additional coverage to your library we have had many many cases where a bug in a library was caught by like 10 levels down unit test that was the first time that that particular code path was executed and no other test in this report in any other repo had exercised it and it saved our skin more than once where like this additional coverage really really helps compiler bugs for instance is is really hard to detect if you don't have those runtime tests and so brad spends a lot of time caring about compiler versions and one of the things that really bugs us is when someone could have a test in integration build and they didn't and then they run their test and it fails because of a compiler bug that we could have discovered like five months earlier when we were doing like the early integration build test with the new version of the compiler and also a lot of mistakes are detected by just running the linker there is a very unfortunate common mistake in c plus which is removing the function definition and forgetting about removing the function declaration i i hope someday we'll find a way to prevent this like in the translation unit itself uh but so far like it's hard so if you don't have like some additional static analysis like the linker can help you because if someone is calling that function the compiler will work but then a test will try to link and it's going to fail and finally api coherency c plus plus libraries above the 2 chain are built from source in our case but i think the the main message that we've learned is it is impractical to try to do semantic versioning in c plus at scale because someone is going to get it wrong and is going to think that adding a virtual method is a minor version when it's a major version change because it's api breaking and and so we we just learned like we cannot trust version systems in any way we just build everything from scratch and try to cache as much as possible to make it as fast as possible but we just assume that every change breaks api and therefore you have to do dependency rebuild of the entire universe so in summary we think that there's three main requirements that a package management system for c plus plus need to address in order to be successful the first one is it has to be independent from specific build systems because the builds the build space is too volatile and if standardizing package management is impossible standardizing build system is impossible squared coherency on how names are used is fundamentally important and the relationship between the names in the package management system the names in the build system the names in the header files the c plus plus name space establishing that coherency is fundamentally important for a successful package management system and finally having an integration baseline and we were i was discussing with some people like uh earlier in the conference and this is something that like it's even a space for a vendor right one vendor can say i'm gonna provide you with a repository of pre-built c plus plus code with all these libraries that you can use coherently and be sure that everything is safe and here's and then you don't have to worry about being tied to that one vendor because it's a standard package management system and you can switch to a different vendor if necessary all right thank you i think we have some time for questions yes what do you think could be games if you could standardize a build system at your organization assuming that all the engineers were also supporting one build system right so the question was what do we think we would gain by being able to standardize the build system in an organization so i think there is a problem with the question in the sense that some organizations have a culture where that works right you have organizations that have like a verticalized software development experience where everyone works the same way the business is oriented that way and and that works out but there are other organizations where and and i'm gonna explain bloomberg a little bit in terms of like bloomberg is choose true infrastructure companies and 300 startups and the startup that operates a particular financial application they are accountable to the product they are delivering and so they are empowered to make the decision that make them the most productive so culturally speaking the idea of us saying everyone has to use the same build system period is impractical because we would need a team as big as the rest of the engineering team just to keep up with like feature requests for the build systems and so being allowing the teams to diverge is a fundamental aspect of how the business operates and so is it technically better in terms of engineering cost to have a single build system across the entire organization yes is that a trade-off that every organization can do no i think i think a point and it's what beyond was saying earlier about languages how sure can you be like when you're implementing like a language feature are you sure you're going to be that's the last build system you're ever going to need that's a really big question i don't know how to answer that i would do think though that there's probably a middle ground we can come to in which certain aspects of your build can be described in standard ways we're not sold that you could actually have a fully declarative build system because we do think that there are weird eyes and ends where you just need to drop down and do a little bit of procedural something to be productive but we do but i think there's definitely a lot of space for things like the name of your library is in a yellow file or something like we could probably come up with something like that as a community yeah when you see someone pitch something like that i aggressively i am aggressively skeptical of most metadata-driven build systems like at the end of the day you end up with a learning clef where everyone does like the metadata and then the steam comes around and now you say oh i guess you have to use a completely different system because this system is metadata driven it's pure and it can't solve your problem right that's not a good point too yeah but more questions yeah for the uh when you're trying to set up the integration when you're trying to make a new version of it what if you do have what if you have an incompatibility where some some project depends on an old version of a of a library are you just unable to uh to update the version of the library and hence any other projects that depend on the new version so the question was what if there is an incompatibility and you try to upgrade the library and it breaks things and now you have people that depend on that library on the new version of that library and you have people that are blocking the upgrade that's a real problem for sure one thing that is a valid work around is the idea of branching the graph temporarily but it's risky because you like you need to merge back eventually some point because otherwise you're going to have to start maintaining those parallel branches forever but you also need to accept that you need to pay the cost to keep the library upgraded and this is something and i think brad can talk a little bit more for like how google test upgrades look for us oh yeah um yeah we have some systems that don't support all the requirements of google tests so we actually have had to freeze the version of google's test for quite some time now we're working aggressively to fix that but the timeline on that's quite quite long so eventually i mean it's on the road map eventually will converge and will be the same as upstream but if we um and we're currently evaluating options and including these kinds of branching techniques but they're going to be fairly long-lived branches and if you think a long-lived branch in a repository is hard try doing a long live branch in a code base and i don't recommend it there are more questions about that so when you're talking about these like snapshot branches like do you have how often teams are working off like the main duration line or kind of staying on their own snapshot branches like how busy how often are those long wave or short waves snapshots how so the question was asking about snapshot branches how long lived are those snapshot branches we have basically ci systems that manage the snapshot branches and they will regularly merge things back into the main branch so you're right and at some point you need a system that at that level looks for technical debt and pushes back on it just like you would on a code base so you might say hey i noticed you have a distribution over here it's really nice but you're using the version of something that has security bugs in it you're going to have to stop whatever you were planning on doing and fix your security bugs and emerge back into the mainland right one thing that was actually implemented like a couple months ago was that we now have a similarity metrics on any one of those branches and whenever the similarity metric falls down you get like a bug ticket saying like you're you're branching too far come back do you have a question back here how do you prevent people from creating creating circular dependencies well this is actually easy with the system because the files won't install because you haven't built that far up yet yeah like in this case we encode it in the system it is literally impossible for you to like build up because we build like this hermetic thing that goes from scratch source to binary source to binary so if you try to build something that build depends on something that wasn't built yet like it can't find the files so it's literally impossible for you to right to create that in theory you could be you could have a clever system where you're grabbing a set of source packages together and putting them together in a sandbox and building them together somehow and then if you only depend on headers maybe that works but then again you probably need compatible build systems such that you can look at someone else's source project there is actually a an interesting story there because as brett mentioned we actually have an unreasonable amount of third-party stuff in our package system like including emacs vi and like so we have the entire xorg client stack in in our package system and the xorg system is notorious for all the cycles that it has and so people have been like inching away at this problem by creating like the proto packages which essentially are like forward declarations for functions that are going to be implemented later and so you essentially have to break down the xor packaging to like the x uh render proto that then something depends and then x render depends on on both and then you can build so there are some some hacks that you can do to make that work we're glad to see that boost packages being broken up for similar reasons yeah sean library but they depend on a particular macro that reconfigures a header file and you have a different library then their build doesn't set that macro so that you've got to know right so the question was how do you deal with pre-processor definitions that change the interface of headers that you would might be building against is that is that a fair summary of the all right um what do you think so i i think this is brad used like when we were rehearsing the talk brett used an expression that i'm failing to remember it's like accept uh divergence but don't tolerate it yeah [Laughter] so one thing that we do is we we actually kind of keep a look in the package config files and if people start putting like weird stuff like dash capital ds in the package config file it's like a red flag for us to go look like are you trying to do conditional compilation in your header like why stop it right so the advice we give to engineers is if you have a preprocessor definition put it in your c plus interface inside the source files inside the header files don't and make it and put it one place don't put it everywhere so that you have that one header that defines for the whole ecosystem this is how we're going to interpret because there's like there's especially older libraries had the the neat trick where you could have a different string depending on like some macro and it was horrible and and people had like odr violations all over the place right right right sean's pointing out how they had they had some interfaces where the they used has included to detect whether a header was available and then it would configure whether what api was using basically based on that we have actually have the same problem if you want if you want to have some controversy for this talk at the end of it we have the same problem with the standard version that although there's a non-trivial amount of code out there that detects uh c plus plus language features and then has abi incompatibilities depending on that so that means that per project we do not allow projects to set the the c plus plus version because it recreates odr violations in and of itself this will be surprising to a lot of you because there's been a lot of talk about the the standard version is not an abi breaking change which is true for the baseline packages you have like we just daniel's talking about like we have system packages provided by red hat those are actually very abi compatible no matter what your standard library version is but everything but everything above it you're on the honor system and so you have so if you happen to have concepts available and they happen to detect you're using concepts they have different mangled names for things in best case scenario you get linker errors worst case scenario you're missing variables and you just have off by you know 16 byte errors when you're copying things it's really bad right using like yeah i mentioned strange pointing out that like different optional different implementations of the optional have that same problem we actually have the same problem because our bde library provides forward compatibility um syntax for things like optional so that we can you know in cases where it's not available use it but and that's literally the kind of problems we have is like bde will detect you're on a language standard that supports these things in the in the standard library so i'm going to use those instead of which is great but it is an api breaking change so one one thing that we do in that respect is we we have we don't have a lot of control over the build systems but we do have more control over the build tool chain settings so our cmake usage our autocomp usage like all of this like we we have a bunch of tricks to make sure that we keep you like a consistent tool chain configuration in the entire set right so we we put anything we think would be an abi important flag we put that inside the tool chain files as much as possible all right i think that's our time yeah oh you have a question uh the question is do we do build caching and the short answer is yes at the package level it's actually not that hard because you know which source which particular build identity we have build ideas per built package and so we could just check all the build ideas if they match up you just use that again um we're also we're we're investing heavily in as much build caching as humanly possible so there's there's going to be more coming up one other change we introduced recently was that if your build produces the same content even if the metadata is different we'll like reuse the cache result uh uh whenever someone consumes you uh and and we're also working uh and the there is a group at bloomberg that is heavily invested in the remote execution api and so bloomberg maintains build grid which is one implementation of remote execution api and we're heavily invested on introducing what we call a remote execution caching compiler that's essentially a wrapper for the compiler and that uses remote execution to do caching per translation unit as well so we're going to have like this tiered system of caching uh because like our bills do take like a lot of time it's already yeah it's open already open source what we have so far right yeah so the parts the the remote execution stuff is all open source the package management part it it has a lot of bloomberg specific stuff but there's but it's essentially the reason why it's we have a bunch of internal code is because we have to support solaris and aix so we can't just use the debian infrastructure but the debit infrastructure has been using change routes and all of that like i think the first time i did like a package like a hermetic package building debian was like 2001 maybe uh and and so like all this infrastructure is already open source so if you want to like just grab the debian infrastructure and use like the the change route like build the package inside of change route with only the build defenses available it's all there it's all open source we have a bunch more stuff on top because of like this extra constraints that we have but i i think we would be super hyped if we start having convergence and we start same way that we have a team investing in build grid with remote execution if we have like a direction on standardizing package managers i'm sure we're going to be all over it we have some things in this cmake space that might it's tangential but related where we could look at open sourcing some stuff to make it easier to fill in your package given the targets you've defined in your cmake list you can do that now in cmake but it's it tends to be a lot of procedural code and we find that there's a declarative way that's a lot simpler have another question does your system play does your system play well at all with conan we have a fledgling adoption of conan when i say fledgling i mean like many dozen projects it's not like nothing but um the so it's definitely an area of research we have a plan that we're most of the way through to be able to have a generalized cma style that works regardless of whether you're using conan or debian and that i may kind of mention that one of my slides about how you kind of don't want to mention in your cmake list where you got your dependency i think that's really important that we need to work on in the cma community the conan adoption has been uneven partly because of the the problem i mentioned in that conan most the conan integrations have the word conan in them somewhere inside of your cmakelist.txt so what ends up happening is you tend to end up having like if there's a conan build file do this branch and if there's not a code in build file do this other branch and it starts looking like procedural it's the kind of the if def i'm on this architecture stuff that we hate from like low-level driver code that just blows up it but in cmake and again you're not going to do that for 10 000 projects it's not viable but i think it is worth pointing out that one of the there's two sides for the con and effort at bloomberg one is the what we call like non-intrusive content and and there has been like some movement in like and michael mcguire from bloomberg has been like heavily involved with the conan upstream right and there has been some movement into getting to like a non-intrusive condom support in cmake as brett was saying and also we essentially have been reproducing all the techniques that we had on the debian packaging system on top of the con and build uh it's just that the scope is much smaller right now but but the principle is exactly the same like they're even calling like build all which is like the command that builds all your dependencies yeah and that's the goal for us is that at some point you write one style of c plus plus and you just pick where you're going to release it conan multi architecture w multi architecture who cares and i think that's something we can get to as a community is that like your package manager is something you you opt into it's not something you have to port into and that's the thing that we that's where we need these kinds of standards and stuff yeah right same question for bc package bc packages is it seems popular but not adopted yet not in any significant way i've casually researched it it's very interesting um the some of the more recent features where you don't have to actually fork the pc package repo itself to do some of the things we just talked about make it more viable than it used to be and the session's over we're happy to take more questions perhaps in the hallway afterwards right after we get our mics off but thanks for everyone for coming good questions too you
Info
Channel: CppCon
Views: 4,568
Rating: undefined out of 5
Keywords: c++ talk, c++ talk video, cpp talk, cpp talk video, c++, cpp, cppcon, c++con, cpp con, c++ con, c++ tutorial, c++ workshop, learn cpp, learn c++, programming, coding, software, software development, cppcon 2021, bash films, c++ packages, package management c++, cpp packages, c++ package tools, cpp package manager, building c++ packages, c++ package development, bret brown, daniel ruoso, ide c++ packages, bret brown c++, daniel ruoso c++, c++ talk videos, cpp talk videos
Id: R1E1tmeqxBY
Channel Id: undefined
Length: 60min 45sec (3645 seconds)
Published: Sun Dec 19 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.