Linux Kernel Development, Greg Kroah-Hartman - Git Merge 2016

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey we're running Linux you never know all right I'm Greg like you said I do stable releases leanness does the development releases I'll talk a little bit more about that as you all know get came from Linux we was our development model that spawned this crazy beast sorry but it works really really well so I'm gonna talk about how we what we do and how we use git because it's a little bit different than most other groups first feel free to heckle ask questions that are upped makes this fun or you can wait till the end but interactive is good so talk about so this is the size of the kernel today as a couple weeks ago that was a 4.5 release release numbers are just numbers they just increment forward it'll mean anything that's also 4.5 like you see 21 million lines of code a lot of code all the drivers for links kernel are in the source tree this is different than any other operating system had done before most of them kept them separate we put everything in the tree and that makes it better because we can change ap is we can change the way things work we can see how drivers multiple drivers work for sort of the same hardware and merging together on average a driver for the Linux kernel is about 1/3 size of other operating systems so it works out really well but still we had a huge huge tree but you don't run it all my laptop runs Oh 1.6 million I think your phone runs about 4 million now 3 million lines of code it's different than what I do but um there's five or sense of the core of all this 2 million line 21 million lines is the core of the kernel everybody runs out the rest is all other stuff so here's what we did last year 4000 developers at least 387 companies I say at least because I keep track of this and I haven't been doing that for the past year so if you submit a patch to the kernel and it's not obvious who you work for I'll send you an automatic email again I haven't been doing that it should be about 450 companies we think we cracked 400 different companies about three years ago we cracked 3,000 developers four years ago 4,000 developers we're looking at this year um largest software project ever it's very very huge it's also a pretty fast rate of change kind of so this is what we do doesn't seem that bad until you realize the scale this is what runs the world the Linux Foundation told me to stop using the word scary in this presentation I'll use it a lot um this is a lot this is supposedly a stable kernel that runs that runs the world's upper it runs everything runs the internet runs your laptop's well a few of your laptop's runs all your phone's runs lots and lots of things runs Wall Street when's your air traffic control things like that scary stuff um sorry I won't say scary not only this is the number you should be scared of that's a lot we we keep going faster every year we go faster we think we're plateauing every year I do this presentation I say ah we can't possibly go any faster and every year we do five ten years ago we're going to and a half changes an hour and that everybody thought was unsustainable there's no way we can keep up with that five years ago we were going five changes an hour or was ago we can't go faster than that we're going faster every single year every single release we're kind of flat telling they'll still keep going up and up and up um and the interesting thing is these changes aren't just in drivers they're across the whole tree so out of all those lines of code I said the core the kernel is 5% 5% of these changes are in the core our kernel drivers are about 40% 40% of changes aren't drivers networking is about 10% 10% of these changes are in the networking stack it cuts across the whole tree this goes against every previously thought software methodology of how to make a stable system but we created something nobody else had so maybe you had to break those rules again scary we went even faster 4.3 release 2 releases ago we're up to 8 changes an hour we did break 9 changes an hour last year for one release I think the release we're about to do in a couple weeks we're going to be about 9 maybe 10 changes an hour I'll have the largest size of release again we're going faster and faster and faster what this means is if you were comfortable with what we did a year ago but for the past work we did and your driver and your tree is not merged into the main kernel you we are going faster so you have to do more work to keep up with us and that's something a lot of companies don't realize if they try and fork and go off on their own that's great it works for a while but again we are going faster and faster and faster and they can't keep up they have to invest more money and more time the best thing to do is merge into the kernel and that way you go and cost you money to keep your code outside the kernel so how do we do this two big things time-based releases and incremental changes time-based releases we started this about 5 no gee 10 years ago we said let's stop doing this stable unstable development cycle let's just do a new release everything's gonna be stable and we said let's make it between 2 and 3 months and we are where we're about 6 to 7 weeks we do a new release and this is good this means that if you are developing a new feature and you try and get emerged that gets rejected well then you have another release in two months pretty much you can get it emerged in there you don't have the back pressure of oh no we're not gonna get in this release because we're doing a release every six months every year we have to accept it now we can push it off get the best technical logical best technical thing working and get emerged next it takes away that barrier a barrier of us having to accept stuff that we don't want to it also takes it also is very reproducible we know when a new release is going to come out companies can figure out I want to base my phone I want to release it on this date so I'm going to pick this kernel that's why I need to get stuff merged in and it works out really well we are very very regular leanness keeps wanting to go faster he's knocked it down to six weeks a couple times he says he wants to do five weeks I don't know that might be tough so how do we do this so here we go numbers all right so leanness releases 4.2 that's zero at the top and then I'll talk a little bit more about this later but all developers throw a bunch of stuff at them for two weeks and it does release candidate 1 increments number 4.3 that's zero or lease Canada one and then every single week we do another release candidate after that first release candidate its bug fixes only or regressions we are very very serious about regressions because we move so fast and we because moves so quickly we want people to be confident about upgrading you should always feel comfortable about upgrading a kernel it should just work if it doesn't we did something wrong we made this statement a decade ago saying we will not break user space and we've held to it Facebook has talked about how they update their incremental their internal servers every single release they haven't had a problem in about three years it works really really well so bug fixes only bug fixes bug fixes or regressions we revert things using git I'll talk a little bit more about how we use git later and then up the rc6 rc7 everything settling down he doesn't mean release and off we go that's how we do it so we started doing this for a few years and we realized wait what happens if there was a big nasty bug in 4.2 what do we do so we came up with the idea of stable kernels so what I do I'm in charge of these I forked from leanness and I do 4.2.1 a two to three to four every about week I do a new stable kernel and the rule for stable kernels is it has to be in leanest tree first and that's really really important we never want to diverge we never want to take something in the stable kernel that isn't already in this is tree and I'm conversely I don't want to take stuff that's been modified sometimes people say well there's a little bit hairy on how we did this in the industry so I'll give you a simpler patch no I want the identical patch because 95% of time if I take something that's been rewritten a little bit it's buggy because it hasn't been tested and it always happens so we have some rules and what can go into a stable kernel bug fixes only has to be about a hundred lines has to be obvious which is in the eye of the beholder of course an aura can just new device IDs like just add a new device ID for your USB driver or device or something has to be like that there has been some bigger changes going in uh sometimes when the memory management guys said here's this series of 20 patches that are all 100 lines but they all fix something I'll take that but again keep on going and then the more powerful thing is I get to throw this away and you upgrade to the next one there's a little bit of overlap of a few weeks but we throw it away and on you go so all the districts that you run their community base like fedora openSUSE arch Gentoo they all run off these stable kernels the enterprise people pick these kernels and go for a longer period of time they like a long-term kernel and I used to work for Novell and SUSE at the time and I was in charge of our kernel team and I realized that I could use these kernels as my day job because we were maintaining a kernel for a couple years so let's do that so we have something called a long-term kernel now I pick one kernel tree a year and I maintain it for two years right now 3:14 4.4.4 we moved from three to four just because the numbers got big and you think the difference between 3.17 and 21 is less than it is between seven and ten and it gets 11 mental um so normally I pick one a year 4.4 was odd and at the kernel summit we decided to do something different and I picked one of the beginning of the year so that one was a surprise but this way I actually lined up good the new Chromebooks that are coming out will be 4.4 days on new Android before that for based I go around talk to companies see what they're using um I think Debian is gonna be for for maybe canonical I don't know somebody else is going to be for full so I maintain these for two years and then I drop them this works well for a lot of companies bigger companies like again Sousa Red Hat canonical maintain them on their own a little bit longer um in Japan right now they are replacing their they call it social infrastructure I think that need a better name because I think of Twitter when I think of that it's their streetlights and their railway systems and all that's converting over to Linux and so a lot of companies there have come to me and said we need you to maintain a kernel for 20 to 30 years and I said yes retirement so they're going to have an interesting problem what are they they're going to pick probably the next long-term stable kernel I do next year and they want to they want it for 20 years um and that's an interesting thing how are we going to support that and what are we going to do so I'm working with a number of companies there so the Linux Foundation are we going to try and figure out how we're going to maintain a kernel for 20 years that's gonna be interesting think about what Linux look like twenty years ago it was pretty bad the interesting thing is going to have hardware that runs for twenty years so I might end up with a big light stoplight in my living room or something I don't know um so yeah so long term currents questions this is how we do releases you guys are easy oh come on alright now let's talk about get so developers we have almost four thousand of them and they make a patch they make a change and every change that goes into the Linux kernel has to be standalone it has to not break everything and it has to be quote correct we cannot break the build all those lines of code all those changes that go into the kernel none of them breaks the build oh one other thing I've got to mention those are the patches that are accepted not that are submitted on average I I accept about one third to one on a good day one half of patches that are accept cement sent to me so there's a lot of work going on out there takes you a number of times they got changes in and a lot of stuff gets rejected so there's a lot more work than just what you see accepted just to give you a sense of how big our sense of scale is okay developers when they make a change has to be obvious has to be broken down into one thing we require people if you're going to do a complicated thing to show your work like your old math professor said you break your stuff down into individual steps along the way every change is correct every change this and breaks something and you have to show a long series on this puts more burden of work on the developer but that's what we have a lot of so we waste developers time because we don't have many maintained errs we don't have many reviewers that's our most precious thing we want you need to send me a set of patches that look obviously correct I'd be stupid not to accept them because they're just so easy and simple and you'd broke it down and show Darrow your work it's hard to do that it's a hard development process for people to learn how to do but that's what we make developers do so to make a change and then they send that through email sorry to the owner of the file or the driver and we have about how many do I count we have 1,000 maintain errs these days which is crazy if we have 4000 developers but I looked and I looked at a list there's a lot of people on the list of maintainer x' that haven't done a lot of work in a long time so they don't show up on list of developers so I think we have about 700 active maintainer x' so they make a change and they send it off through email we have mailing lists for every different subsystem of the kernel like USB there's a USB mailing lists Guzzi just scuzzy block memory management there's a big Linux kernel mailing list which gets about 4 to 500 emails a day the big secret is nobody reads that we all filter Andrew Morton I think reads it all but he's different I'll show you some more work that Andrew does later so they make a change and we review it we look at it an email we respond back and inline comments we don't top posts we don't post an HTML it's all plain text and that's good it's good because we want people who might not have English as their first language we want people from other backgrounds through email you're anonymous in a way it's just what you're writing it's just the text right there some projects and I will point at OpenStack they make people get together in a room or they make people work on IRC together and I don't think that works well I want it so that if I respond to an email they can take a day they can run into Google Translate they can think about it and then they can respond back because that works better that works better for people who again English is not their first language we want to be much much more inclusionary we don't know race we don't know nationality we don't know anything and we don't keep track of that and that's good so again email plain text old-school works really really well um yeah they look at it they say yes no whatever and they go on so here's a change you guys all know what patches look like there's a really old one we make people include a few interesting things first line again good get history I mean we created the get style so I guess you can look at one line change log text saying what did and then we say signed off by every person who creates a patch has to say signed off by we don't require contribution copyright assignments we don't require clas you own the copyright you just have to say signed off I'll talk about that in a minute and then the owner of that subsystem at the time was David the USB gadget he said yes I acknowledge that looks good an email and then I picked it up at the time and I said great I sign off and I add it to my tree again the line was an obvious let's actually look at this variable before we dereference it doesn't see wonderful um that's it that's a patch obviously correct in a way it goes so let's talk about signed off by the get developers know this because they require this as well let me same up with the developer D Co developer certificate of origin it's a little bit more complex but this is what it means means that you're allowed to contribute this change to this project under the license of the project has a lot of other groups to pick this up SCD from core OS are using it samba is using it docker is using it a lot of other groups are using it I really recommend it it's a very solid body of legal work on its again this is the legal terms but it's very readable it's something really really good the leanness did it's like the exact opposite of a CLA it's giving permission not requiring all these other things works really really well the get guys get developers used it works out good so everybody says signed off by so what this means is if you go back and look at this patch not only is it signed off by so there's a legal term there it's now a path of blame so this is wrong I get to go hey somebody says hey Greg David Robert fix it and your names on it it isn't a company's name you're hiding behind an alias it is your name on the patch and when your name is on something that's public you do better work you just do um that's a really really powerful social experiment that's made the Linux codebase really good but it also means it's the best audited body of work I can take anyone there's 20 million lines of code and say you can get blame who changed this line and who reviewed it to people every single line is a 21 million lines of code I've given talks to companies about this they can't even claim that for their internal code bases so not only is the largest software driven a community driven software project as far as size it's also the best audited body there's no question where everything came from which is pretty amazing so path of blame it's fun so then the developer sends it to a subsystem maintainer so sub subsystems are like USB PCI networking Wireless and now we all have good trees there's a bunch of get trees we use get kernel.org we don't really use github I think we have about two hundred three two hundred different trees publicly a few people use a github but most everybody uses get the kernel org so these trees are now public so we have different branches we have one branch that's going to go to leanness now for bug fixes one branch that's going to go in the next release and then we put the patches in there and that's public that's an immutable branch we don't rebase never rebase and it's public now so then what we do is every single day Steven Rothwell in Australia takes all those sub trees and we merges them together and then he builds them on about 20 30 different architectures now maybe more now a knee boots him I think he boots 30 different ones and tests it so then I'll get an automatic response every night if something like if my tree messed with the networking tree or my branch over here mess with this other subsystem branch and it boots so Linux next happens every single weekday it's really good if you want to see what the next version of Linux is going to be use Linux next if you want to do development do development up next not off what leanness has because what Lena says is old this is what happens so every single day all our trees can merge yeah first time a compiled test on it next level hopefully this person gonna compile this and then that person and then that person usually I get I have gotten a lot of patches that this person never did a compile test so it's up to the subsystem maintainer to at least build it um but there's something else happens I'll get back to that on Linux next just tests the merge and test that the merge works because even though I'm a maintainer of USB in the kernel you don't own anything absolutely on the networking people can say hey we got networking drivers that mess with USB stack we need to change this over here and then great it goes to the networking stack sometimes they'll see it sometimes they won't other people can change your code yes you're a maintainer yes your name is on it but other people can change it as well it isn't absolute that works out great so then in Andrew Morton's over on the side picking at things that aren't maintained we have a number of subsystems where people don't maintain them anymore or like those 1000 emails some people don't respond to their email anymore um some people pass away you know we're all human people die people move on so he's picking up all the random pieces and he has a tree out there Andrew doesn't use git uses quilt and quilt is a stack a bunch of patches on top of a base works out really nicely and I use quilt also for the stable kernels it doesn't really work well with that because he can rebase this tree and do fun things like that so as it happens so this was working really well too a couple years ago we realized nobody was testing anything it's really hard to write a test for an operating system if you say hey you booted that works which is a non-trivial thing to be sure but it'd be nice if other people tested and nice if we didn't break something on one of those different architectures we supported like 50 or six now 80 different architectures be nice if we didn't so Intel came along and some developers didn't tell did a skunkworks project and I think they just grabbed a whole bunch of CPUs noe was using we don't know what they're using and they created something called the zero day bot and the zero day bot scans all our trees all our public trees and they test build them they build them on I don't know 50 different architectures different random configurations different other things and they run tests they run static analysis tests we have a bunch of bunch of static analysis tools cochineal which is a really good tool for testing out finding patterns and C code and seeing what's wrong we have a ton of those we have other static analysis tools we have something called sparse leanness wrote and a number of other ones they all run through this thing he also is now picking up patches off mailing lists you post a patch to a mailing list you'll get her you'll get a response saying this broke something which is great I don't know how he does it I don't know what's really behind it I talked to him last year and he said he can handle 7,000 more get trees I think it's a testing ground for Intel's new processors I don't know what he does we're happy but so it test is everything it tests every single commit of your commits you push one day I landed on it after taking a flight I pushed a bunch of work out I went to get a coffee fifteen minutes later email saying this patch in the middle of all these commits broke the build and here's an automatic patch that fixes it so now we have scripts that are writing patches who owns that that's another interesting thing the legal the lawyers are going to deal with anyway so zero day dot he tests a bunch of things and that's what we're doing testing we're doing a bunch of testing the performance guys are doing testing in there to make sure that we don't slowly degrade or different things as we add new features that's where the testing happens every single day really really fast it's amazing the processing behind that that happens so then when I said we had a merge window merge window happens all the subsystem maintainer rather stuff Salinas the rule is it had to be in Linux next first we used to be really bad about that now we're getting better I think about 95% of the patches that end up in leanness is tree we're in Linux next other 5% sometimes come in during the merge window there's like two to three that we don't know where they came from that's not good we're getting better um but we I tell Leena's pull from this branch lean this does not pull from next on his own and that's important that's important because sometimes my tree can be broken so I maintain the tty and serial drivers as well one release weed has something wrong it was just wasn't working all these changes it wasn't working well we couldn't figure it out so I said I'll just wait I'll hold off on the big merge to leanness I'll send a few bug fixes let's wait till the next release again two and a half months it's not a big deal so he couldn't if you had pulled my tree you would it broke it would broke for a lot of people so he we throw things to him and then he merges so in a merged window he's taking about 10 to 11 thousand patches in two weeks kid is nice it's all done through get poles except for Andrew Morton Andrew send some email series and applies them again get applies mail boxes of emails very very easily thanks because it was our development process um it works really well but that's all get Oh everything blue on up is good and it works really fast really easy um yeah and then we do he doesn't new release and off he goes so questions oh come on yes so the question is do we because we have to break everything down individual patches a rebate or a refactor or refactor refactoring a fix um that's it really rare that usually is very rare normally it's like if I'm going to refactor something it's usually the next patch it does something real and even then it's that's very rare but again if we take 10,000 patches and five of them or refactor okay we'll take it but if you're doing that is that again that's a step along the way you have to show your work and that's fine if you want to refactor things we do like to see if you're gonna do a long series that you're fixing some bugs the bugs first fix the bugs first that way we can back pour it into a stable tree easier we don't want to have to deal with a refactoring because then that can get Messier but yeah it's actually pretty rare what happens if you and Linus are on vacation or you're on vacation or Andrew Morton's on vacation so if the human element breaks down ah good segue what happens in the human element if we break down so this looks like a nice pretty tree right triangle I graph this one year turned out to be a 1 meter by 1015 meters long it's a mess it's a network of connected people it doesn't look this pretty we can route around anybody the good thing about git and the way our patch process works so if I go on vacation I just get routed around and it works fine um somebody comes in through somebody else like again networking David will say hey I'll pick up USB patches for a week no problem it goes into him it works really really well everybody just gets rerouted around and it works really good another thing about this is because we're using email in the beginning we all review everything but as we move up the stack where you start using git and I can't see those patches so what we have here really the kernel development is not so much a it's a tree of development that we're reviewing everybody stuff but it's a tree of trust so I trust people that I take a pull request from I trust I don't always trust that they got it right I trust that they'll be there to fix it if they got it wrong and that happens a lot like there's some people I'll just blindly say great Alan I'll take your patches no problem and because I know he'll be there when you fix it when it's wrong and that's the important thing if I'm taking patches from people and I am i put my sign-off by eye when the path of blame I'm now responsible so I have enough work I want to take patches from people that I know we're responsible to fix them so if you're trying to get into kernel development it's hard because we have to trust you we had a problem with a networking stack about five six years ago a huge nasty hairy change landed finally it was big and complex the day after it was merged email address disappeared it took them six months to unwind the mess the networking developers are very paranoid now that you have to have shown a history of good commits and good changes and good support and being there and asking questions to prove that you'll take your patches sometimes I'll just ask a dumb question back to make sure that the person is even listening like hey why'd you do it this way or things like that but make sure there is going to be some feedback you are going to become part of this community I'm not going to be responsible for maintaining this code for forever because we do I maintain code that I wrote almost twenty years ago other people do that too um it's nasty code but we all work through it and do it but again we have a web of trust so I trust five to six people Alina's trust ten people it's again a tree of trust and that's how it works so the kernel development process because we put our name on things and because we take pull requests from other people it's really a development model of people human interaction or trusting people we're trusting people that will do the right thing that you'll be there to fix things and that's what the kernel development process is it's not just pure technology it's not just pure blind patches flying around it's people that know each other we travel around we meet each other at once a year we have subgroup meetups once a year for different subsystems we meet and work with other people and other developers that's good oh one of my best friends now listen Germany 15 years ago he didn't know English it's a really weird development model but it's human interactions and through human interactions and human responsibility for their changes we created something that really works really really well something no company could have ever created mess Linux Russia doesn't know I don't care about your background like here you're going to be there so I have to rely on you because if I take code from you for fixes for one drive-by patches so half of those patches that we take half those peak patches come from person that has never seen a guy that's easy that's fine one drive by spelling fix new device ID simple bug fix great I don't care if you're adding new features if you're adding a new subsystem yes I want to make sure you're going to be around I don't care who you are I just want make sure you're going to be there funny story um many years ago we started having the Linux kernel or we were worried about Microsoft sending us things and we're all being paranoid somebody showed up out of nowhere with these beautiful patches for the plug-and-play subsystem we had no idea how that came about where they came from so we I made him prove I think it was a him because it's it Adam made him prove where he got the information she pointed out the public knowledge point out where it was a couple years went by we have a big kernel developer meeting every year he got invited it was in Canada he showed up with his mother because he was in high school we had no idea he now is a professor at Stanford a really brilliant guy but again we had no idea that he was a high school student and it didn't matter because he was a maintainer of that but he proved here the inner back-and-forth of where he got that information from where it worked out and went on from there so yes how do I deal with the conflicts between maintainer it usually doesn't happen get as good at merges as you know the kernel tree is pretty diverse so USB doesn't touch networking doesn't touch so it's these these other things everything is pretty standalone it's up to the subsystem maintainer usually if they're going to be having to deal with merge issues um about other Nats we will get if you look at the Linux next mailing list Steve will send out an automatic response saying hey this conflicted with this I don't know what to do or here's the here's the merge that I did is this correct you know tell us so I got I got about one of those twice a week so it happens but they're usually minor conflicts or API changes like if somebody changes a new API and the drought you like networking and I'm adding a new driver through this other tree well I can't change I can't break my tree so I have to be aware of that and he notifies us but it's usually pretty minor for the amount of changes we do it works out really really well it's scary how good get can handle merges yes I take a bunch of patches from all over the tree I could be doing cherry-pick I could use cherry pick put them in a branch but the problem with my patches is sometimes then I post them for review sometimes the fifth patch of a 20-pack set because they're all individual patches shouldn't be in there it the maintainer says no wait that broke or no you shouldn't do this or wait quick add these other ones after that I would have to rebase that tree I don't want to rebase the public tree ever so I don't do that I use quilt quilt lets me put in and remove and reorder and restructure things and then when I do release I do quilt a.m. our quote was it get quilt mailbox well get quilt apply sorry so it's a commit that are it's a command that only me and a few Debian developers use um based on how bad it works and it creates a get tree and applies them and goes away from there so once we do a release then it goes in the quilt they get Andrew wrote quilt um actually he wrote something previous to quilt so his again he takes a lot of despair things from all over the place and he reorders them and he also doesn't send them all to Lena some of them he'll hold on to for three or four releases he'll pick up random things then he'll sometimes send those random things to the other subsystem maintainers because we missed them and then when we does that he'll just drop him from his tree if he sees that they show up in our tree so he uses quilts because he can reorder things and doesn't work doesn't bother anybody there are two different models they both work really well I really recommend people look at the different models if you're curious yes I think last question over time Hey so what's the role performance people who would touch things across four subsystems it doesn't happen only thing that would happen performance wise I'm thinking of scuzzy or block the block layer people they have to worry about certain things again the memory manager guys say i/o joke hey you're writing a driver for memory haven't aren't you guys finished yet but they keep going at it again we are in pretty siloed things we all work together but it works out well very rarely do we cross paths so I'm out of time and I'll be here today thank you very much ok and again the talk was online and that was my obligatory penguin picture one more really warm welcome for Gregg so thank you sir
Info
Channel: GitHub
Views: 66,299
Rating: undefined out of 5
Keywords: git, github, git basics, VCS, programming, version control, open source, software development, octocat, linux, linux kernel
Id: vyenmLqJQjs
Channel Id: undefined
Length: 37min 34sec (2254 seconds)
Published: Wed Nov 16 2016
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.