Creating and maintaining a conda-forge package

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello this tutorial is about making condo forge packages by me um this this is the logo of condo forge i'm not affiliated with them but i put the logo here mainly because the slide was very empty um yeah this tutorial is gonna basically go over the the steps and the theory of how to do it but not really all the details because each package will be very different so i can't tell you exactly how your package will need to be made but at least you'll get some idea from this if you're just starting out just a disclaimer i am not an expert i have made a couple of condo forge packages but definitely not an insane number but i just thought that since i couldn't find another good youtube tutorial on this um i thought i'd make this because some people like myself prefer watching a youtube video over diving straight into documentation still to make your package probably you will have to read some documentation in the end but at least here i hope to give you some idea so to start you can skip over this if you already know what conda and pip are but just for the people who might not know conda and pip are both package managers with which you can install usually python packages and for users oftentimes they don't really know what the difference is and and why one would prefer one over the other the commands are pretty similar so why even bother to make condo packages if we have pip for example pip is kind of the default package manager for for python so it's used to install python packages from pi pi pi pi.org there are lots of packages on pi pi i think there's yeah now what 20 280 000 almost so quite a lot of packages um and usually the packages are uploaded in kind of a source code format so what you actually upload is a tarball of just the python source code because you don't need to compile python generally right and the let's say installation instructions which is just telling pip how to actually install the package which means putting the files in the right place is in some kind of you write as a developer a setup.pi file and then everything is is handled for you basically that's how you create a pip package you create a setup.pi file with the right instructions and this works great if you're working with pure python packages but once you start having other languages in your in your project then things can become a lot more complicated for example if you rely on certain c functions or maybe c plus plus fortran then that code has to be compiled on the system of the user and you have to write those kind of instructions also in the setup.5 file it can get quite complicated and things can go wrong if the user system doesn't compile things correctly so generally yeah pip and then combining it with other languages is really not ideal and this is where conda has sort of an advantage so with conda you're not downloading from pi pi but you're instead downloading from anaconda cloud and it's a completely different repository um here so basically with conda the main difference is you're not downloading usually source code but you're already uh downloading pre-installed platform dependent binaries so if you have c um or you know some some code in your package that needs to be compiled then this is already done by the people who uploaded to conda to anaconda cloud and you when you install with conda you're just downloading the binary you don't have to compile it on your system and that's quite convenient from a user's perspective makes things a little bit easier for the user so generally conda can manage any type of software in any type of language it's not only for python and it's great when you have multiple languages basically in your project much easier to distribute than on on pi pi i would say and so one of the ways in which conduct got popular popular is in its combination of sort of the scientific python stack with numpy scipy matplotlib because all these libraries depend heavily on c and so with conda that was much easier to manage a conda is created by anaconda which is a company and it's distributed through the anaconda or miniconda distribution that you can get from their website and conda doesn't rely on the python interpreter so it can be basically behave like a platform independent package manager as you would have sort of a system package manager on a on a linux distribution or so another advantage of or another feature i guess of conda is it can be used to create virtual environments so you can create nice little boxes on your system where you have let's say different versions of different software installed completely isolated from each other that's also a very nice um feature so conda is convenient when you have multiple languages multiple programming languages especially languages that need to be compiled in your project very convenient to distribute your software this way but one of the downsides of this infrastructure of anaconda cloud is that the packages are distributed over multiple channels so basically when you call usually when you install something via conda you would type um conda conda install package or whatever but what you're basically doing then unless you've messed with the settings is you're only looking inside one channel namely the anaconda channel and that's the official official channel by the anaconda company basically if you want to download from any of these other channels then you would have to add an extra argument conda install c and then if we want to install from the title of presentation conda forge it would be something like that so to specify the channel now having all these packages of different versions and different built for different platforms so you can see here lots of platforms some of them less fewer we have different versions available it can get pretty messy on like which channel do you want to install something from um and so this is kind of the the problem that condo forge is trying to solve so they are trying to be sort of the default community package channel because on the official channel this is completely managed by the anaconda company you cannot add packages to this channel you cannot contribute to them you cannot update them this is totally out of your hands you can only update to channels separate from anaconda but you don't want all your software to be distributed over so many different channels you just want one community channel basically and that is what condo forge aims to be and so if we go in there you can see that they have around 12 000 or almost 13 000 packages built into condo forge and actually condo forge is extremely popular as you can see that um actually the numpy version of condo forge is more popular than let's say the so-called official uh numpy version because the condo forge also tries to stay more on top of the game in terms of um having the most recent package versions no we're more recent on conor forge which is nice so before we go into how to create a condo forge package how to contribute to khanna forge we will just uh we will first look at actually how conda packages are built because this is how kind of forge is also working in the background so understanding sort of the structure and idea behind building a condo package is sort of a prerequisite to the condo forge process so i'll just run through the example that you can find in the condo build documentation here and yeah if you don't care about this you can obviously always skip ahead um but well first what we want to do is we want to have so what we're going to do now is build a package locally on my system and this is not how kind of forge works but you'll see the structure of the package anyway so the first step is of course to install conda build and you'll see that this is actually installed already on my system but we'll let it run anyway okay it appears i have to update my condo version but you can see all the requested packages already installed and then basically i have access to the condo build command um so what we need to do first here i'm in this example directory and i think this should be empty yes and we're going to try and build a package well the same package from the documentation is called click so it's already available on pi pi which will save us a little bit of time because actually there's a condo build utility with which you can sort of automatically already create a template from a package that's available on pi by so you don't have to start from scratch because that can be a little bit more work so what we're going to do is i'm always typing in the wrong window conda i think it's skeleton and then pie pie click and that'll run for a while and basically what it's going to do you can see it's finding this repository i guess and it's going to look through the setup.pi file and basically find sort of the the necessary dependencies and it's gonna create a bunch of files in some temporary cache folder from which a build can then be generated and this can actually take a while so it appears to be done now and if we look now in this folder we can see that a new directory called click has been generated and inside you will find a metadata yaml file and this is the key part to creating the conda package so and i just wanted to go over this because this you'll also see uh come back in the condo forge part this is the essential part basically so you see that the metadata that we require in this file is some information on the package like name and version the place where the source code can be downloaded from in this case of course pi pi all of that stuff is filled out automatically for us in this case how to actually install the source code let's say or the package which kind of dependencies that we have and then um yeah metadata like where the project is hosted what kind of license you have a summary and then probably you have to edit this like you will be the maintainer basically of this package and that's more or less it now if you didn't start from a scale skeleton then you would have to basically write this write this out manually this type of structure um now note that we have a ginger templating enabled in this in this file so basically we can set some variable like click is basically the name and then we can reuse that variable in other places throughout the throughout the meta file basically so we'll exit out of that and so in this case we don't need to make any edits but when we have a more complicated uh package to build then most likely we will have to make a bunch of edits in this meta.yaml file but in this case because it's the example from the conda build website we can just do conda build and then click and this is also again going to take a while okay so now you see that it's done and what will have happened is that in some location on your system it's rather hidden away but you well it tells you where it is a tar archive will have been created and this you could then use to upload to anaconda cloud on your own channel basically or you could so this is telling you how to do it or you could also install this now from for your local machine i think with conda install and then click local or something like that but for this you should check the conda build documentation basically it's somewhere in there i'll link to it in the description so that's basically it now you should note and i can't stress this enough that this seemed quite simple we had to just wait for a little bit um but in fact you will have to most likely do quite a bit of editing in here especially in this requirements section that can be quite a bit of work to get it to work properly so it must be noted that the kind of dependencies listed here okay now there's basically none but if you have a lot of packages here they need to be the names that can be found on anaconda cloud on the channels available to you in the case when you're building condo forge packages those will have to be on the forge channel so for example an issue that that could be in this case so if if the if the package on pip depends on pi qt which is for making graphical user interfaces then the automatically detected package from the conda skeleton command whatever package name that's entered here will be incorrect because the name is actually different so you can see that in anaconda cloud you would have to install pyqt with this name but if you were to install from from pi from pi pi with pip you would have to use pi qt5 there is no pi qt so the names of the same packages can be different on the two systems so that's definitely something to be aware of so take care that whatever you're putting in this requirements is actually the name that can be found on the conda anaconda cloud channels basically it needs to be able to be found here and the channel needs to be accessible basically to you so if you have set in your configuration file that only you want to download from anaconda and condo forge then it won't look for the name in any of these other channels basically so something to be aware of so now we only this was a pure python package and we only have a metadata yaml file but if you are building a more complex um so there's only a meta.yaml file right but if you're building a more complex package you will most likely need some kind of um command line instructions to instruct conda build what to do it won't know what to do for example in certain cases with c plus plus or c or fortran code so to do that something helpful is that you can use build scripts and so let me see if i can find that yes so you need a meta.yaml file that's required for all of them this is the recipe but then for the more complex packages non-python packages for example on linux and mac you will need a build.sh script which will be run basically after some preliminary steps here so at number seven it will run the build script whatever you have put in there and on windows if you need to build on windows you will need a bld.bat file and then optionally you can have a couple of other files but these three are really key so if you're building more complex packages you need to really take care of this build.sh file or build.bat file so that's very important and that will also help you understand the next part so now let's finally go over to making condo forge packages and what you want to do as a first step is go over to the github repository condaforge staged recipes so you're ready to start making your own recipe and contribute it to condo forge so you want to go over here and the first thing you want to do is fork this recipe for not this recipe this repository and i've already done this so i have my fork over here that's me on github and [Music] what you want to do here is to create a new package you will want to create a new branch that has the name of your package for example so in my case let's we'll look at some of these well one of these later but if you want so you can see i already uh did some dummy uh thing but let's say we want to add a new package add package y or z right and then in this branch we would create edits to the recipe and how do we create edits we would go into the recipes example and we would basically edit this meta.yaml file and you can see it's very similar to what we saw before only there's a couple of comments between to tell you what to do basically now it's a very very simplified one um as it was the other one so for more complicated recipes you will probably best look at some of the other examples and we'll go over that later but so i'm not going to start a new recipe from scratch because it's quite a bit of work but we will just go into one of the packages that i've made before so that's on the branch exit wave and this was to build a package called exit wave reconstruction this is a package i found on github and it's a c plus plus if you want to build it on your system you need to do all kinds of gymnastics um it's generally not really fun to build this kind of stuff and i ran into lots of issues when trying to compile it locally so to save other users from this i thought hey you know why not create a condo forge package out of this so it's a nice example because it's a little bit more complicated than just a pure python package so if we go in here and on this branch look in the recipes you can see there's a example folder but we've created two new folders in there and this one c image um i needed to create that one because it was not it was a dependency of exit wave reconstruction but it wasn't yet available on kind of forge so that's something that can happen if you want to build something and the dependencies are not kind of forged you will first have to create a recipe for the dependencies to also get them into ghana forge so sometimes that can be a little bit some yeah some work but so if we look in the c image part you can see that the meta.yaml file looks more or less recognizable very simple metadata yaml file because actually the c image thing is just a header file and so all that needs to happen is basically it needs to be downloaded from the github repository and then in these build scripts all that happens is that this file is moved to the right place in the in the to the right folder basically right so that's c image if we look at the exit wave reconstruction what we have in here the meta.yaml file is a lot more complicated so here in the build we have to define some kind of compiler we need cmake we need a bunch of other stuff and quite a lot of dependencies here and all of these are available on condo forge so you just have to look for them in anaconda cloud basically but to create this is kind of a lot of kind of iteration and you'll see that later but the build script in this case is basically calling cmake right with setting the cmake flags and all that make install and then move a couple of binaries to the right place where they can then be called from the command line and that would then effectively have installed the package so that's basically exit wave and so if you were to create your own package you've created your branch you've created your folder and your metadata yaml and your build files you've uploaded them to your fork usually you would do this locally right you you clone the repository you make a branch and then you push it back up it makes much more sense than you using this gui interface thing but so once you you're happy with your recipe then the next step is to create a pull request so you would do pull request and in this case i've already done that so i'm not going to go through with this right but you would create a pull request to kind of forge stage recipes the master branch right and so we'll go in here so this is this was the pull request that i created for exit wave reconstruction you can see it's from exit wave into condo forge master it was already merged [Music] and in there you will find a handy checklist to check yeah did you really do these things correctly so they're helping you a little bit and luckily they have a bunch of bots active on these on github basically so there's a linter that will check whether there's mistakes in your in your recipe so for example here it told me ah there's a space missing or something which is a mistake so then you you create a new commit and you fix that then it tells you okay it's fine and then basically what you need to do is just wait so so whenever you trigger a pull request it will automatically launch their continuous integration services on azure so i can't show that on this pull request but we can go in and look at another one so if we go to condo for stage recipes well not issues pull requests so these are pull requests by other people right you can see this one is not successful and this one is successful so maybe we can go in here so this is another guy trying to get this whatever package into condo forge and you can see here this is what you would see in your pull request so it's launching some some basically it's trying to build it on the azure servers and if it doesn't work then you will see that to see the details you would go in and check basically what's going on so let's view more details on azure and so bash exited with code one let's go in and check and then you can see you know where things went wrong basically so let's see what is what is the problem here some test failed okay but so basically you can see kind of the the the build happening and when errors pop up you can sometimes know what to do and then of cour according to the errors or whatever feedback you get from this you update your recipe basically and so in this way it's kind of a very iterative process where you're adding and subtracting certain requirements for example to see whether it will build on the on the server or not and because i don't know very much at all about c plus c or fortran compilation i likely had uh my colleague jan over here helping me out uh with with the recipe so he was suggesting a couple of things to add to the meta.yaml file but so basically you can see a bunch of commits which is constantly changing changing the recipe and then yeah after a while so quite a lot of stuff but then when it's when it's successfully built basically you can ping the mods and say come check it out you know once once there's the the little green little green check mark when that's there you know when it's ready checks have passed you ping the mods and you say hey you know my recipe is ready could you please merge it into the master branch and they will look at your recipe and say you know they don't like certain things please change this so then you make some changes and after a while if they're happy with it then um yeah they'll give some more comments after a while they will say we approve this and they merge it into condo forge master and then when that happens basically the first step is done or it's more or less done right what will happen then is after a while your package will appear on condacloud under condaforge so you can see in this case you can conda install with on the karna forge channel this exit wave reconstruction which was my package and in this case i have only made the recipe for linux so far i highly recommend if you want to try and make a package try and build it first for linux so they will accept it if you are building only for one platform but if your tests keep failing for the other platforms they won't ex the mods won't accept your recipe so try and build for one platform first usually linux is is easiest to get it working because with mac and with windows there's all kinds of additional weird caveats try try building linux first and when that works then you can go on to the to the next step if necessary but anyway so once your package is on anaconda cloud what will also happen is automatically a feedstock repository will have been created so in condo forge and then the name of your package feedstock so hello um change of scenes because someone started vacuuming in my house and you couldn't really well it was very distracting in the video so anyway we're back on our exit wave reconstruction feedstock and um this is the repository where you want to make changes um if something needs to be updated or changed in the recipe which will then update the anaconda cloud package so this will have been created once you have a successful pull request and you will see that you or whoever you mentioned in the pull request will be the maintainer you will have full access to this repository however you should never directly edit or commit to any branch on this repository not master not create another branch you shouldn't do any of that what you should instead do so let's say that now in my recipe um i've only got this built for linux right the exit wave reconstruction is built for linux so if i want to add a recipe for windows or for mac i would go and fork this repository and then when i fork it i would make a branch there say add windows build or whatever and i would make my edits there and make a pull request to this repository to the master branch now the difference between the previous pull requests and this one is that you are the maintainer basically who has to accept and merge the pull request the reason you don't want to push directly to this repository is that every time you do the ci services get activated and whatever is produced gets uploaded automatically to the anaconda cloud so it's very wasteful if every experiment every build gets automatically uploaded so you want to do your experimentation in the fork and then only once it works you want to trigger a new build in the main repository so the pull requests all of that stuff is the same you will see that you will be able to do to see the same [Music] azure pipelines you will all of that is exactly the same the main difference is that it will be you who has to check that everything works now usually like if you don't want to really change anything what's really helpful is that there are some handy bots to help you out so basically yeah these bots so for example this one will automatically detect when new versions of the software become available so basically when someone pushes a new version tag to their github it gets registered and the bot will trigger a pull request and say hey this software gets a new version the bot will automatically update the necessary files so here you can see it changed the version number it changed the sha sum so and that's basically it so all of that happens more or less automatically and you don't have to you know keep up with the version the versions yourself and that all you have to do as maintainer of the feedstock is um merge these pull requests so can i quickly show that come on yeah so and that's basically maintenance right so uh to summarize super small all right so to create a conda package in general is you want to install conda build for your system create a meta.yaml file and possibly if it's a more complex project build.sh and bld.bat you want to have a conda build and gun you want to do kind of build package name to create the thing on your system and then you want to upload it this is not the way to do it because then you have it uploaded to some weird channel right if we want to do it the correct way creating a condo forge package it's a little bit more complicated but definitely more rewarding so you want to fork the condo forge repository kind of forge feedstock you want to create a new branch on your fork you want to edit the yaml file and basically create the recipe and then enter a pull request to the master branch you want to iterate and use the continuous integration services on azure to see if the build works and then once the mods approve your successful build your package will appear on the anaconda cloud in the condo forge channel to maintain your package you want to keep an eye on the new feedstock recipe repository that gets automatically created you want to accept updating pull requests by the bots i haven't had any issue with them but if the build fails of course you will have to do some manual intervention and if you want to make changes yourself like you want to add support for another operating system you want to fork the repo and make pull requests and so definitely i haven't gone over much of the details on how to actually you know what to actually put in the metal meta.yaml file or in the build scripts and for that you will definitely have to read some documentation right so i would highly recommend at least re like if you're building a more complex package go into this condo forge documentation and read some of this stuff under knowledge base this is can be very helpful for you i don't know very much i don't know much about this at all but this can help you construct your your your meta yaml file and also in general you can read the conda build um documentation and otherwise there's always the help you might be able to get help from the condo forge getter channel so this is a very busy channel but yeah you might find some people who are able to help you out with building a more complex package for example and that's about it from me so i hope you found it somewhat useful and i hope this will get you started on creating condo forge packages thanks
Info
Channel: nickcorn93
Views: 9,644
Rating: undefined out of 5
Keywords: conda, conda-forge, tutorial, python, packaging, beginner, feedstock, science, scientific computing
Id: 8s5aj3sjuVE
Channel Id: undefined
Length: 41min 30sec (2490 seconds)
Published: Wed Dec 23 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.