BazelCon 2019 Day 1: Building Self Driving Cars with Bazel

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] hello everybody hello everybody welcome back thanks for coming back thanks to the folks up in the balcony so we're just going to get started with the next talk here we've got Axel and Patrick from the BMW Group they're going to talk about building self-driving cars with basil thanks Jay hi everybody I'm axel this is Patrick we will show you how we use basil to build self-driving cars so I'm axel I started at BMW as a software engineer by now I mostly take care about our CI systems so making sure that all our developers get their feedback as fast as possible yeah so my name is Patrick I'm currently the role of a lead release engineer at BMW especially in this project I'm have some experience before is built within a long history of building software for embedded systems started with some small embedded systems where it can easily overlook the whole source code up to infotainment systems which is nowadays more or less a Linux distribution and up to now a cluster of multiple embedded systems from one source tree so our department is working on every driver assistance related feature that we offer that is the beeping sounds you hear when you park your car up to fully autonomous driving and cities for BMW that's quite an advanced and big software project and we are trying to use state-of-the-art tools wherever possible also one of the reasons we're here it hasn't always been that way just four years ago roughly our software was split into 200 different software components all of them were in different repositories and for most of those who didn't have the access rights so you couldn't even see what your federal co-workers were doing we also had to set up two different build tool chains because we didn't trust either one of them and just to make sure that we don't up and you don't die when you drive our cars we had to I'll say the feedback was very delayed so it took a lot of time for the developers to know if they introduced a new buck or not to lies at that time we both mostly used MATLAB and see the developers used Windows as their host system so for the development machine build tools were seemed against cons and for the CI we set up a jenkins and in the last three to four years we grew a lot like a lot but now we have more than 23 million lines of source code close to 2000 developers are working on that software stack and that results in more than 20,000 CI builds per day we mostly code in C++ and pison these days but you will find all sorts of programming languages in our sort mistake host system we nowadays use mostly Linux but for some corner cases Windows is still needed even on the CI the software is not actually deployed to then either Linux or Windows and then shipped to the customer but to a very special hardware with a special operating system which is then built into the cars at the factory and this hardware is using a completely different architecture several ones MD 64 MIPS and this is then what the extra customer is getting we also set up a new CI system it's now hosted on our on our on-premise cloud we will talk about the CI system in a bit it's not just the scale that grew a lot it's all say well the complexity of the feature that changed a lot all-tournament autonomous driving became a thing and we started also working on that a couple of years ago at large scale and with this advanced features you need advanced tools advanced simulators which is a huge challenge for your CI system and at the same time we adopted a very agile way of working so our teams would now usually modify several software component at once which is even further stress for a CI system because now you need to run multiple test suits or the average code change all of those things combined the growth of our developer the growth of the source code of the tools and of this new working way pretty much broker all build system we had really huge issues with feedback time and stability so we started to look for alternatives and that's when Patrick came across basil yep so we looked around back then when we changed the way we are working when we changed the code that we are developing also we took the opportunity to look at our to landscape to basically see what kind of tool could fit our new requirements and our requirements are not anymore derived only from the languages that we use in our code base but also from from the safety point of view so we actually need to find a build system that satisfies all the safety requirements three years ago I went to the the Basel conference here to say the same place actually and there were some Lego bricks and yeah that was the reason why we picked Basel okay the official version sounds a bit different we looked into various tools bachpan see make makes guns you name it right but we from the from the past experience we figured out that there are some features which are really cool to make to build a software software in the right way one feature is of course the sandbox feature so we need to ensure that we include and link to correct files into our build so that we do not take just another file which is named the same way but in a different tree for different purpose that we take this one instead so sandboxing gives us the opportunity to as long as you have your depends is clearly defined to also only put the stuff that you have defined in your sandbox and use it during your build next thing is incremental builds so if your code page is growing we also need a tool that supports you making incremental bullets and the benefit on the plus side is also we can also incremental tests now that increased the velocity of our developers a lot and written it on a slide so only rumors say there's a Basel clean unfortunately our developers got used to the way of working like oh it doesn't work let's make a make clean try it again damn it it still fails of course it does because you have an issue there in your rule you have issue there in your build file you should air on your code it's not anymore related to the build tool or some weird configuration it took us a while but nowadays it's pretty well accepted that people don't clean anymore their workspace except for developers of rules of course but they do still need to do it regularly therefore we use the output user root slack on the command line to just point temporary to another place where we can basically run our build and test the rules last saying the dependency management last thing on the slide sorry the dependency management in terms of ease of where we derive our safety requirements from the ease of - six - six - there are certain things described in there to not make mistakes others did already so this morning we saw this nice picture one of the first pictures where you see what you shouldn't do with remote or basically people have written down some requirements into specifications which ended up in the easy two six two six two one thing is dependency management you need to have a clear process for your dependencies and we figured out that you can establish additional processes you can step put into additional tools to manage your dependencies but it becomes much easier if you build system as such already maintains your dependencies and you treat them as code in your source tree and that means any change that you make on your these can be easily gated as well they can be reviewed as well you can make change requests or poor requests for them basically on the next line we see first a hermetic builds that's kind of crucial so we can easily say that our build is not anymore impacted by the host environment this kind of works for me thing right it's not happening anymore not at all so it reduces simply the fact how your environment could influence you built the comes to the next point reproducible builds so basil has to focus on making your boat reproducible but just because basil tries to do that it doesn't mean it actually happens or it's there by definition I of course we have additional tools in your base book which still could produce non-han Matic non reproducible outputs but with basil it becomes more obvious when this is happening and having the reproducible builds also allows you to validate changes on your host environment against a previous build so we can again safety case we can detect if the environment has an impact on our ability results so we can validate the results that we built on the cloud as well as the valid resource we validate on a bare metal host we can see that there is no difference so we can see that we can trust our CI system which is running thousands of built in the cloud environment and queries - I initially mentioned that I started with some small embedded systems where we can easily overlook the code it's not possible anymore and we need to support for looking what is going on during the build so why is why are things happening and the career part is really helpful for doing so so thanks a lot it also helps us to set up some testers Jen query rule basically that you can use to query the dependencies of a certain target and you can easily detect architectural violations with this rule so you can see what kind of dependencies are put in you can create whitelist for it that you do not pull in dependencies which are originally meant for different target actually all in all we figured out we can kind of achieve the same other build tools as well but only if we add a bunch of other tools around it and if we built up a massive tool landscape to achieve the same goals and that in the end would also be hard to maintain it would be hard to review it would be hard in terms of safety to see how do they interact with each other actually so it becomes quite in handy that there is a tool that provides all this in one place and we just need to review this tool actually so how did we create yeah so know that Patrick convinced us fuse your engineers that basil is the cool thing the question came up how do we pull this off I mean you saw it before 23 million lines of source code Welbeck Vern was a little less but still we are talking about huge efforts here to migrate your existing software stack two completely different build tool and all this while being deployables this whole time say you cannot just stop everything migrate and continue two months later every product manager will just kill you say we came up with a plan and the plan kind of worked out well for us and that's why I would like to share it with you and might help you as well what you see here is the time acts and the usage of our CI systems you see the in R at the cemex system which kept on running in the background of all the time and then at some point was shut down and you see the basil system which first slowly and then very rapidly was able to build our software stack and at the end it's the only tool we have left the way we did it was we first approached our management and said hey we have this new tool here and we think this tool might help us to solve some of those issues we're having please give us a team of five to six people to look into this a bit more deeply they gave us the team we started the next day what we did was writing down all the use cases that the C makes the AI system was fulfilling and then we picked on purpose the most hardest ones that or where you have the hardest time to do it in Basel for us that was the integration of our simulation middleware it's something we will talk about later as well why did we pick the other stuff at the beginning because we wanted to find potential blockers as fast as possible three months in actually way longer than we thought it would take we didn't find any blocker so far and we actually reached a point where we said well the remaining 90% of our work space can be pretty much migrated by what by copying what we already did for the first 10% so this gives us a pretty good feeling and we approached our management and again and said hey I think we figured it out this will actually work let's roll it out and so we did and for roughly a month almost every other developer of our department worked on basil icing or source code initially we thought this might take one or two weeks it took a month what they basically did is writing build files wherever there used to be a C make file for sure we didn't find some things as some things that turned out to be way more complicated than we thought they would be so this delayed the whole thing a bit further but at some point we got it the final test drive on the road was done they could even drive the car using basil and at that point we switched off the CM xgi sister why does this approach works kind of well first of all it's easy to sell to your management because you don't ask for a full Basel migration at the beginning you ask for a small team to investigate if there are any blockers and only if you can technically prove that you're very likely to pull off the whole thing you then go for the final decision this is really easy to sell and it's safe to sell also for you and the other benefit is that you will only lead for a very short period of time a lot of manpower this is the rollout phase and at that point of time when a lot of your developers will contact come in contact with basil for the first time you will already have a team of basil experts in your company that work with basil in depth for a couple of month which is a pretty good learning resource for every other developer in your company ok so how is basil working us out for us now actually quite well when it comes to execution times for for example unit tests we were able to achieve a 10 times speed improvement using remote caching for the build for the final target so the hardware that is then put into the car we are even able to speed it up by around about 12 times the cool thing here is that in order to build for the target you need some really weird tools like code generators or the operating system which is then running on this ECU which is building through the car all of this we were able to integrate into basil so all of those steps on our basil actions and can be cached which is super awesome MATLAB code generation can take forever so this did not even speed up the CI system but also the local development a lot the remote caching server itself we actually host multiple times for better load balancing but this is I guess more like a technical detail we are also using remote execution however we had a bit harder time to set this up correctly the thing is it requires way more hermetic tool chains something where we screwed up a bit on at the beginning however we currently using it for our long-running sentence test and then we are able to speed them up on average about five times compared to how we fast we where before we currently use built bound for that and we hosted ourself on our on premise cloud we know that some big cloud companies offer this as a service however we are not allowed to compile source code at the cloud of a company that also develops autonomous course so sorry but if you're into that kind of business we think this is a really good business opportunity for you yeah so how do we use it in the CI actually before we put all the source code together into one workspace we would have like that's a workspace to be a moaner repository to one single get repository but due to some legal constraints we are not able to do that still using its app modules we stitch together a workspace that feels like a remote repository and all of the time or most of the time and since we're using Zul by OpenStack which is CI tool we are also able to do this on the CI site meaning the developer can change the code all over the workspace even a different required repositories then pushes the code to the respective repositories and Zul is then on a CI side again stitching all that together and checking it at once and either merging all those changes in all those repositories at the same time or not so we got this mono repository feeling even though we are using multiple repositories the CI strategy we are using is basically pretty primitive it's built and test everything for every change which is the most expensive thing you can do but luckily using basil we are able to pull it off on average we are able to pull more than 90% out of the caches so that's either the local action caches of our built notes or the remote cache server so even though we're using this very expensive test and strategy the feedback times are reasonable the cool thing about this test strategy is that you find defects as fast as possible one last thing we are now also able to do incremental builds on the CI technically for sure you could also do that with C Meg but we didn't trust see Meg enough now we do okay so I will tell a bit about our journey with C++ 2 chainz so it's actually mentioned we started in the early pace with four small hard things to tackle and back then the hardest thing you could tackle is getting your crust tool chain integrated into basil so we started with the crustal form which was worse back then still written and proto path and without any documentation or at least not obvious to us and while doing so we learned a lot about compilers so first thing is we were using partly yuck 2d built some part of our basis based system and also our tool chain doctor has a smart way of packaging the tuition in the SDK it means you get archive from the shell script you download it you execute it locally it starts extracting itself then it starts patching all the binaries in there to find correct link approach which means you have some linker script in your tree which have an absolute path in there so when you integrate this to chain into basil without running all the scripts before because that's something you'd certainly don't want to do you end up in a situation where the linker always complains that it cannot find the files because it tries to link against your host fonts we figured out basically that for the linker script so to make the crew Landcare happy you need to have this equal sign all the time there that it brief exists this tree with just this root otherwise it won't so you give to your hole to chain you give it to systems as this this root but it simply ignores it at this point in time next thing is depending on how you call GCC is a rough net suit or relative pass it creates dependency files in a different way so sometimes in your dependency file the tree to the to the far to the header files you depend on have an absolute path sometimes open relative paths basil relies on a relative path to to the fact that we had to integrate the two chains with some rapper scripts we figured out that we call GCC always with naturopath therefore afterwards we got an absolute thousand depends a tree basil was complaining about we have nuclear dependency also integrating non GCC compiler so we have seen before that we have also macro controllers that we have other systems where we get it a to chain which is certainly not based on GCC not even close to it and basil still makes them back then made some assumptions on what you basically need to what your compiler needs to support that means we also had to write a lot of repres crypts to map or rewrite the calls from basil towards the compiler to make the compiler work with basil or to reprocess certain outputs from the compiler that it fits to the requirements from basil luckily meanwhile we emigrated to C++ to chains so things become much easier here it gives us more flexibility in order to configure the two chains for the certain compilers a lot of pepper scripts got obsolete luckily and we don't need to maintain them anymore but there's still sometimes some surprises one thing is that feature configs seem to be still inherited from the original config even if they're not explicitly declared in your own configuration file next thing now a journey with Python 2 chains and we heard previously already I don't remember who it was that patent is not integrated pretty well at the moment for certain parts and we had the same experience actually since our code base is mainly superstores and pi peyten for all the tooling and tests we also have of course certain dependencies on piping means we started with the option pick two routes from the routes patent repository which can be showed up that there can't unconditionally load all the dependencies you have to clear even if you don't need them which is okay ich would say as long as they don't have platform-specific wheels and you're developing and Linux and Windows because then it starts failing so if you basically run your built on Windows and you have a platform wheel that's only comfortable with Linux it broke the workspace was all the time so that's also one overall regarding the workspace rules there's also one lessons that we learnt be careful with your workspace rules actually they're pretty dangerous at the moment since they are not emetic you can easily do stupid things in there and break your whole built and we did a lot very often so please do as little as possible in the workspace rules we have my custom workspace rules for our wheel archives which are unfortunate tightly coupled to another workspace rule which allows us to authenticate our dependency system thanks to base I think basically 1.2 where we get finally HD net RC authentication again we are now reflecting these rules and we also try to make them open-source afterwards our in our tree we also had the coexistence of Python 2 and pattern 3 before we moved on with Python 2 chainz this was a challenge for us because there was only one part on top and we tried to make also this part as hermetic as possible so we have our own way of having an hermetic peyten integrated in the to train nevertheless we had to use different type heightened tops in different configurations that we are building that was really painful after all they were fighting two chains so we created or started to be create half a year ago we figured out that we have meanwhile a bunch of wrong assumptions in our tree which rely on Python top two weeks ago we finally managed to finish Timmy creation to fight in two chains so it took us two so now one recommendation from from our side batteries start me creating as soon as possible to Ematic Partin that simple life much easier yeah yeah so that simulation middleware I talked about before it's Ross some of you might know it it's open source that stands for robot operating system it's actually quite popular in robotics not just academic but also for production use we use it a lot for especially for a rapid prototyping of our algorithms so where the code is not being deployed to the final hardware yet but you already want to test it out on the road so therefore we took put a lot of effort into making Ross work well with basil and the video that can now start shows you how the current workflow looks like thing about just playing the loop so maybe let's first watch it yeah so I guess that looked kind of simple but for us this was a huge step because what you just saw is a developer modifying some source code and then he's using a lot one-liner well a long one I have to admit but still it's using a wall liner to verify his changes in the simulator and before using basil that was really hard to do you had to set up multiple terminals and run the correct commands in the correct order in different terminals whatever in order to verify that whatever you did works out well using basil it's now one basil run command and you can check whatever you did in the simulator it doesn't stop there though if we go on to the next slide you see one slide before that yeah this is one of our cars one of our test cars there's a lot of stuff in the trunk but especially the black box that is like a personal computer and in that computer you can insert your personal harddrive which you can unplug from your workstation go to the garage put it in there and then in the car you can start up your personal Lubuntu and run the same command I showed you before only with slightly changed parameters hit the road and test drive that thing all with one command say there basil helped us a lot I mean for sure this is for the like the rapid prototyping testing we just want to try it out if it kind of works the actual testing before we give it to customers looks a lot different when for sure then you also have all those runtime issues once you start using the the final hardware and so on but if we talk about like development speeds basil and Ross especially the combination of both worked out really well for us yep so as excellent mentioned we developed certain rules for Russ Russ has its own ecosystem it comes with a bunch of tools a bunch of generators there are multiple versions of loss and they also have multiple built systems for us so they have was built then that catkin and no one was to they've even something new again so we figured out maybe it's also a good opportunity to bring basil into it since they are obviously quite open for new build systems so we developed roots for us as you seen on the architecture picture before there are certain notes what does a note a note is basically a system process they can either control sensors or actuators or there's some math calculations this note basically consumes we're through a roof for it it's a gross note rule it consumes a binary that have previously built is above CC binary or CC peyten it wraps it into a package for for the rust tooling generates some manifest files and prepares basically a workspace or a tree that can be consumed by the better rust tools we also have rules for generating messages so what is message how does the the rust notes communicate to each other wire messages is a wire IPC or network based messages if you have a distributed cluster of rust nodes messages are simple data structure so you can more or less compare to prove debuff actually this data the data structure of a message is described in a message for node or service file and we've removed for it to consumes all this message files runs the generators over them and creates a library that you can afterwards link to that's your binary for a roast node then to combine all the things together we have a rule that calls Ross launch it generates more or less the launch file a launch file is basically an input file again so that catkin knows exactly what kind of notes it needs to start it combines all the dependencies your launch target into a work into a tree which is afterwards available in your run files because although the rustling cat can make some assumptions on how your tree looks like you cannot simply say okay I want to start this this this this gives you individual paths and it starts it no it assumes that there is this structure available and then it follows the structure and tries to find in a smart way all the notes it should start so we have here the Ross launch target that you can afterwards also call with basil run or basil tests as we have seen any video before yeah so unfortunately we haven't managed yet to make the rules open source we are working on that we had our one target to make it open source until today we failed but we will continue working on it so we figured out that we have some certain Bream double infrastructure structure specific assumptions in these rules that we first need to sort out before we can make the rules open source they will be available on our github page and we would invite everyone to contribute here and also to exchange some ideas how we can improve rules for us yeah that brings us pretty much to the last point when can you buy the first basil car that will be in 2021 so not too far away then we will release the beam Babel Eurovision our next and also Alban WS which launched after that vehicle will have been built with Basel or at least their driver assistance system will have been built with Basel it's not just that though two chains are known to propagate a lot in our industry so my personal expectation is that in a couple of years other OEMs and suppliers will follow we already see that so there will be more Basel cars on the road soon that's it thanks for having us and we are looking forward for your questions [Music] [Applause] thanks XO and Patrick I am very much looking forward to writing in the first basil card it sounds great as per normal have questions up here in the middle give people a chance to to go from there why don't we start up top first this time and then we'll go from there we've got about less than 15 minutes 9 minutes for questions david national instruments you mentioned that you got five to six engineers to prototype this and kick it off how many are continuing maintenance on your basil paste work or not well we by now got several teams which more or less work in our release and CI pipelines and those people who have been in this initial basil team are now distributed across those teams so there's not a single basil team anymore as they used to be it's now different purposes teams but basically the knowledge spread is the people spread and how would you compare that to what you had before I mean how many people maintain the CI system before two versus like with C make and the other tools that you replaced it's kind of hard to tell in our case because it's not just the build system that grow but also the manpower in general say wow well I would say that they're two things who want one thing is we are still not completely done with it right so we still have a lot of things to tackle to stabilize to improve to speed up because although basil is evolving luckily we also need to keep up with newer versions of basil so that means we need now is still more people than we would need if we have a well-established tool chain that we just need to maintain and see that nothing breaks right so all in all we are still 15 to 20 people with a strong focus on this basil environment yeah that sounds about right and I think also one thing that changed within our organization is that now the build tool is no longer seen as something as a must-have where you try to spend as few as possible on but rather is an advantage when it comes to development speed so by now we luckily do have pretty good chances if we ask for more manpower it comes to improving our build times thank you so hi dere I'm at Scouter and I'm the offer of Bill Barn huh I was wondering what do you guys dislike about bill borne most that actually pretty good question it would really like to get in contact with you so maybe we can meet later again just for now we like it a lot it's I mean there are different alternatives and this is the best one so we picked that one I think a current biggest issue I think the the monitor is not any work working anymore right after the basel 1.0 update the results monitor yeah yeah but please let's have a talk afterwards yeah hi I'm Michael I work at Tesla so we're obviously very interested in your work here I was hoping you could talk a little bit about how incrementality works with respect to your simulations because naively I would sort of expect any change anywhere in the system to invalidate the simulator which is maybe running everything as a black box so if you could talk about how that works and if caching has helped you at all for potentially long running simulator now you definitely had a pain point there I mean as you said the simulation what we call it the acceptance tests are supposed to test as much as possible of your software stack right so any potential change will just invalidate almost all of those previous test results and therefore your cash results that's why we those tests were the first one we put onto the remote execution side of things because they could not really be tackled with caching so much and that's why we used remote execution here Thanks yep another I mean mark from lift my question is yes I want to know how you maintain consistency consistency in your environments between like developer workstations your CI and like running on the car mm-hmm so before basil we had a ZK list Python packages and Debian packages which were then installed using the usual tools either on your CI as a docker container or on your local development machine but then you ran really often to those works of my machine but not on your machine issues what we do now is basically we still have those Debian packages and pip packages being installed but we're introducing them one by one and moving them to the Basel workspace we have only a few left so by now we got a pretty much nailed down and this gives you as much as consistency as we did not have before okay do you have like problems with say like one of those packages changing in workspace and then like your entire workspaces say invalidated then you have to redownload everything yeah for sure okay all right all right thanks yep Austin Chu blue river how have you guys dealt with tool chain qualification for ISO 26262 that's your typical sorry to attend qualification at Tilton qualification is part of ISO 266 - and Veysel becomes part of your tooling so ready to start to address it at some level correct exactly so so we we looked at it in 440 zero two six two six two basically we what you typically do is you analyze the impact of this tool to the overall outcome of it and how likely it will happen so we looked not only on a specific tool because that is what most people often do it it you look at a specific tool and how can this impact but we looked at the whole tool chain basically and how does this tool behave in the whole tool chain and back then we figured out that we certainly need not only to qualify available but also to to look into the whole tree and see what else could find problems caused by basil means we have certain mitigation tests all over the place in the end to just ensure that baseless orchestration and this still losing my micro no it's there again okay so it's it's a big effort to get it done because you need to know what you're doing still and you need to know what after what is happening with your artifacts that you can basically derive from that what kind of mitigations you have to put in place thank you hello my name is anthony i work in general atomics my team works on autonomous drones and in nga or in aerospace and and autonomous industries in general you have to make sure our code follows misra standards and I was wondering how did your team use basil to make sure you're seen C++ code is Missal compliant oh yeah again one of our pain points of course so we have certain misra checkers and also code quality check a static code analysis all does all this in place but it's a bit tricky to get the inputs in the right format for your tools basically so how did we do that with basil basil offers various ways to extract or cruelly basically your fonts that you use for built and then you can set up some packages that it just feed outside the base of it into these code quality checkers so it's actually not triggered by basil itself but it's done outside i john field Google um I'm just curious apologies if you mentioned this do you use build barn for development bills as well as CI or just CI only it's currently only CI but we are planning to roll it out for developers also yeah thanks I'm from Europe so you mentioned you are making out the beautiful Matic like including also the party tendency and the final a lot of open source dependency or you have to peel the right eye manual root like many to write all the rules to build other open-source project or how you track at them dependencies for do you open source to assume you mean basically from from the packages for open sources like Debian files yeah your files or basically like what we have heard before that the workspace fast that you can inherit from other projects it's not just workspace like put an open CV or many other team that doesn't have existing video rule for them like how do you compile them and as part of that we verified several open source components as well luckily in our environment due to the fact that we have to build software for a safety car it's not that many so we either we consume them as a binary pre-built already like the platform wheels or wee-bey safai them and put them into our third party tree ok that's perfect right on time thanks so much axel and Patrick thank you
Info
Channel: Google Open Source
Views: 5,070
Rating: 4.9298244 out of 5
Keywords: type: Conference Talk (Full production);, pr_pr: Bazel, purpose: Educate
Id: Gh4SJuYUoQI
Channel Id: undefined
Length: 45min 34sec (2734 seconds)
Published: Thu Jan 16 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.