Publishing (Perfect) Python Packages on PyPi

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
thanks very much so yeah the the title is a slight joke though I started writing a relatively normal title and then I realized that had a lot of peas in it so I thought I would add a few more just to make it difficult for the session chess to pronounce so sorry okay so there's a lot to get through so I will hurry on quite quickly but in a moment I'm going to show you how to build perfect Python packages from scratch I won't be taking questions because honestly this is like the tight space for 30 minutes anyway but I still have to talk about something first that's more important than Python packaging and that's me so my name is not gt2k but this is my handle more or less everywhere online feel free to follow me on Twitter I tweet about Python and brexit occasionally angrily so yep my real name is Mark Smith I'm a developer advocate for next mode one of the conference sponsors I would be remiss if I didn't at least briefly talk about next mode we are a software as a service company we offer REST API is that allow you to do telecommunications and various other forms of communication including video streaming online to mobile devices and web server your web apps and things like that if that sounds interesting come and talk to us at the next Moe booth or send me a tweet or come and talk to me around the conference I'm happy to talk about that another time so now I've talked about next moment let me talk about JavaScript so in March 2016 a developer removed a library called left pad from NPM the node.js equivalent of Pi P is a big web service holds packages when you need to pull down those packages and install them into your JavaScript application you run NPM and it downloads them and installs them so that you can access them from your program left pad so it broke lots of packages that depended on left pad it turns out that quite a lot of packages on NPM depended on left pad in fact it had been downloaded two million four hundred and eighty six thousand six hundred and ninety six times in the months before it was removed from and from the the repository it was just one function and it was eleven lines of code that padded a string to a certain length by adding characters to the start of that string and lots of people thought that this made the JavaScript community look kind of silly it's why would anybody publish a package that only consisted of 11 lines of code why would anybody use a package that only consisted of 11 lines of code now I don't think it made the JavaScript community look silly I think actually it made the JavaScript community look pretty awesome now obviously there's some problems with the fact that people can remove a package that everybody's depending on and kind of break the entire JavaScript community's software but in general why was this package published so if you don't agree that this made the JavaScript community look awesome and feel free to fight me on Twitter so the left pad the left pad is not a problem left pad was a solution so because the JavaScript standard library is relatively small and doesn't contain lots of useful functions that say the Python library standard library does contain like the ability to left pad a string with some characters a developer wrote a solution for it himself and because NPM makes it easy to publish small items of code he did and he made that code available to other developers so that they then didn't have to reinvent the wheel and that means that we people can bug fix in one location they can submit PRS to improve these 11 lines of code and then ultimately people can well ideally depend upon it now the alternative to this is copy and paste copy and paste I think you can agree is not how you should share your code because it's really easy to share with NPM people do and even with really small libraries and I don't think that's really the case with Python because people find Python packaging slightly fiddly it's there are some slightly odd things which we'll go through about pushing your first Python package to pipe I do I made it difficult for myself to say some of these sentences as well so people are a bit afraid of setup py there is a good documentation out there lots of it conflicts with each other but there is I think a growing movement to bring some of this stuff together and make it easier to find current best practices hopefully this talk will help a little bit so what I really want everybody in this room to feel at the end of this talk is confident to publish packages to fight bi relatively small packages maybe ideally pure Python packages which is what the example is going to be so I would like you all to make a package what we're going to do in this talk is I am going to show you how to make a package so the first thing the assumption is that you already have some code which is general enough or close to general enough that you think it would be useful for other people so let's take some useful general-purpose code like a function that prints hello world now who in this room hasn't written this program in some form exactly now that is totally wasted time everybody in this room has written this program wouldn't it be so much better if you could pip install it and call that code from somewhere else so that's what we're going to do also note there's an F string here f strings are awesome you should use them so the idea is you've written some code that you're proud of you would like to share it with the world so the first thing to do is extract it from your code base so that it is independent of that code base that's your problem I'm not going to show you how to do that in this case we're going to extract this code into a file called hello world py is a Python module next we're going to put that module in a source directory and I will explain towards the end of the talk why we did that for the moment just assume henyk is excitedly clapping for the second time today awkward so I'll explain why later but for the moment just just take this as correct practice and then in the same directory as the source directory directly the contains our source directory we are going to create our setup py file you can open it up in your favorite text editor or Python IDE and we will enter something like this into the file so the first thing to note here is that we're importing from setup tools and not to dis to tools dist utils you will still find documentation online that recommends importing from disk details don't do that it's not that powerful compared to setup tools pip already is distributed with setup tools so if you are installing packages with pip you have setup tools it's not a third-party dependency so and then we have this function underneath it we call setup that we've imported from the setup tools tools module it's a bit weird for now just don't think of it as a function call think of it as configuration each of those parameters is essentially a line of configuration that you are pass it that you are telling you are giving to pip to tell it how to install your package so this is pretty much the bare minimum setup information you need to provide so we start with a name name is what you pip install so this is the name on pipe I that it will be uploaded under so people will install it it doesn't have to be the name of the Python code that people with import it's a separate thing usually they're the same sometimes they're different we need to pick a version number here I've just started with zero zero one zero zero version numbers and imply that it's unstable there is a good chance that the first few times you upload this to pi PI there will be a minor packaging mistake and so this is a good stage to start to upload packages to pi pi while you've still got this unstable version number so you're not worried about kind of breaking people seeing instability and it's not actually your codebase it's actually your packaging configuration then we have a description this is usually a one-liner at this point say hello it's not a very useful description but we'll leave it at that then we have pi modules which is a list of the actual Python code modules so we have a file called hello world up py so in here we're saying this is the code that we want to distribute so that's what people import not what they pip install and then finally again this is a kind of cargo copy and paste sets up config we have this package dear line which is a map that sort of said empty string and source that all that is doing is telling setup tools that our code is under a source directory so don't worry put it in your code and forget about it after that so now we've we've built a package so let's build it potentially we could distribute it so we run the setup file we've just created with the be distri all commands this tells it to create a wheeeeel file that's something that is appropriate for uploading to pi PI and it will spit out a load about put most of which I've deleted from here but the line that I've highlighted in bold up here is the one that's important so what that's saying is that it's just copied our HelloWorld py code file into the Lib directory which means that it will end up in in our wheel if that's not there then essentially there will be no code in we'll have an empty an empty wheel file and it won't work so we can now look at what's been created as that part of a be distilled command so here are the directories and files so remember we've only actually created two files so far HelloWorld py and setup py so everything else here it was created by setup tools so I have a few things here it's created an egg info directory in our source directory you'll want to get ignore this I'll show you how to do that at the moment this is horrible I wish it didn't do this I'm going to ignore it from now on then we have a built directory so this is where setup tools kind of moved our files to in the process of building our wheel file you will see the hello well the arrow is not in the right place but you will see the hello world py file in there so again validated that our code is actually going to be in our wheel and then finally we've got the actual wheel file here that it's put in our disk directories so that is our final distribution so now we can install it locally so this is just effectively testing our packaging so it's not testing our code but it's testing our setup py file so here we are going to pip install and then with the - e flag and . . dot whatever you want to call it this can be confusing to people if you haven't seen this before just out of interest who hasn't seen this before yeah i thought so this is actually an essential command if you are building Python packages or rather they're essential flags so the - e normally when you install a Python package it installs it into your site packages folder inside your Python distribution it copies the code into your Python distribution we don't really want to do that while we're working on our project we want it to just work with the code that's in our source directory so that's what - e does it essentially links to the code that you're working on instead of copying code into another location so that means we can once we've install this package we can to continue to work with it continue to run it continue to write code against it without having these two copies of our code that is just going to cause us problems further down the line the full stop at the end means install the package in the current directory so it's looking at the setup py file and so it's saying install this package by linking to the code that I'm working on will you run this every so often every time you change your setup your Wi-Fi you essentially run this again to make sure that your package is installing correctly and that your your Python code is available to you so bear in mind our code is under a source directory at the moment so if we run Python in that current directory we can't import hello world yet because it's not in our path so let's in theory let's test it so here we run Python the repple and then we do our from hello world import say hello because we've just installed our code into our car virtual environments we kept this will work now even though our code is on to the source directory Python has been told where our code is by the setup py file so we can execute the say hello function we can execute it by passing at the optional parameter everything works as we would hope so that's it's a rough testing we'll get on to better testing at a moment but there's just just a confirmation that our code is installing correctly so at this point we have a working package with some useful code in it so we could upload that to pi pi immediately but I would say that there's a few things that we really need to do before we get to that point that's documentation and testing but also just a little bit of housekeeping which I'll run through now so as I said there's some files created that you really don't want to add to your git repository so it's useful to have to get ignore file this website is fantastic they make it easy to get a hold of github standard get ignore files that they publish for different language and operating system communities so you would write Python in that text box hit create and it will just spit out a text file into the web browser that you can copy and paste into a dot git ignore file so now we're ignoring all the main files that Python creates so it will stop you from uploading pyc files and a bunch of other artifacts on the Python project if we're going to publish this code we also need a license if we don't have a license we haven't given permission to people permission to run our code they can look at it but they can't copy it or use it which is not greatly useful so we need a license dot txt file and a good way to if you if you don't know the ins and outs of the different licenses and the restrictions and freedoms they grant the software that you're publishing this website choose a license com is incredibly useful it essentially asks you lots of questions and then gives you your options and how they compare to each other so it's a good way it's a human way for non legal people to understand the differences between different software licenses we need to add some classifiers to our setup py file so that people can find the project in pi pi by searching on our filtering on Common Criteria so here we say that this is Python 3 code it runs under Python 3.6 and 3.7 we haven't really tested that yet but but we know that there only runs under those versions of Python because where there's an F string in the codes like doubt I chose the gplv2 so put that in there that was a bit of an arbitrary choice these cannot be looked up in this URL at the bottom ypi dogs classify it classifies there's a bunch of them try and be as try and apply all the useful classifiers to your project so that you're describing what this project is for and how it's used then you need some documentation but before you write some documentation you need to work out what format you're going to write your documentation and you basically have two choices at the moment one is restructured text which is written in Python it's used widely in the Python community all over the place and core documentation is written in it a whole bunch of the libraries you use have written in restructured text but it is a Python solution and if you're working on a project that has some Python code and maybe some rust code or some C code or something like that those people will will probably not have encountered restructure text before but they will probably have encountered markdown and markdown is a valid choice it is simpler but also less powerful so you really choose you're making some compromises here both of them allow you to use tools like Sphinx for restructure text or make ducts for markdown to compile a directory of markdown or rest files into a directory of documentation that's all linked together and both of these are supported by read the docs so you can publish either of these documentation kind of sites to read the docs and then not have to worry about hosting them yourself so once we've decided and I've chosen markdown again kind of arbitrarily we need to write a readme that's pretty much essential for any modern project here we have a title the title of the project we have a small paragraph describing what the project does we should have a section describing how to install the projects with some sample command line code for installer for pip installing this project we should have some sample code just to tell people how to use the useful code that we publish to pi pi and then once we've written this it's nice to have this also published on pi PI so as well as publishing on say github or get lab or wherever you're publishing your code it would be really nice if we could insert make this essentially the official description of our project and we can do that this again this is even if you've published packages before and you use restructured text to write you'll read me this is now a new feature in pi pi as of about a year ago pi pi supports markdown directly so you don't need to convert your markdown to restructure text before pushing it up to PI pi so here I've just got we're taking advantage of the fact that sets up people why file is code and not configuration by opening the readme file reading in this this block of markdown and then we apply that string to our setup call so we use the long description parameter just provide this this string value that we put into a variable and then very importantly we need to tell pi PI this is marked down and not restructured text which we do by providing this mime type as this content type parameter at the end I wanted to talk about dependencies I've cut this talk down a bit so we won't actually show any code that uses blessings but for example if we used this terminal coloring library called blessings this is how we would add it to our setup py file so we have it an install requires parameter it's a list of these specifiers that describe the library and the versions we're prepared to accept I will talk a little bit more about those in a moment if we change the library dependencies or anything else as I said we should run our pip install - II dr command again just to reinstall the package and just make sure it actually pulls down the dependencies and at least things work together and then we should run some tests but we don't have tests and we shouldn't just keep on opening up the repple and randomly calling functions to make sure that things work I would recommend you write your tests with PI test PI test it's awesome but in order to write tests with quite s again we need more dependencies but this time we're not talking about dependency of our library like blessings so we're not saying this is how this is needed to run we're we're saying this is a development dependency so this is something people need to install in order to develop code with our library and in order to declare to develop dev dependencies I recommend you add them as extras into your setup py but a lot of people here I suspect are using requirements dot txt for this if you have a setup py file I would argue you do not need a requirements txt you can do all of this within quite under standard packaging framework and you get some advantages because again this is code not configuration extra so the way this works it looks a little bit like install requires but it's got an extra layer of indirection so you'll see that it's a it's a dictionary rather than a list it's a but but you still see that list in there is the first value so the key is the name of your extra so in this case we're saying dev and we will tell people that they need to install the dev extras in order to work with our project and then after that it is just a list of dependencies so in this case we're setting PI tests above or equal to the value of 3.2 version 3.7 and then we can tell people how to use it so again we update the readme we have a section saying if you would like to help develop hello world this is how you install the development dependencies so that you can run the tests and it looks very similar to before but you'll see we have the word dev in square brackets afterwards it's saying we're installing our current module with the dev extras you may have used this with other packages maybe not seen how that was specified I stole this straight from adders which i think is why Teaneck is here so yes if we if we install the extras you will see that it installs a whole bunch of other stuff basically dependencies for quite us so the difference between install requires and extras requires is that install requires is for production dependencies things like flask click numpy pandas and the version should be as relaxed as they possibly can be so you should be testing against multiple versions of each dependency in this way you're not locking your users in to a specific version of a shared dependency so if both you and your user are using a ters ideally you need an overlap there so if you're all using if you're using version 3 they use in 3rd and 4 you know they're not gonna be able to use your package unless they make some changes extras require is different it's for optional requirements for development or testing or whatever extras you what groups of extras you want to create and the version should be in my opinion as specific as possible because you're you're trying to get your your developers up and running as quickly as possible and so creating an identical environment to yours and the other developers who have been working with the code is that's just going to make everybody's life easier rather than trying to debug like minor variations in your develop penances requirements dot txt still has a place but I would argue it's for apps that you deploy onto machines that you control so in this case you're pinning every single production requirement to a specific version so that you're producing a well-tested collection of code on a on a destination machine so use fixed version numbers with the double equals operator and you use pip freeze to just spit out all of the things that are currently installed and straight to your requirements dot txt so here we write some tests I'm going to zoom through this a bit because I'm running slowly slower than I would like so but yes we run run our code now we just need to run my test to actually test our code each time it's much easier than actually executing the code by hand it will spit out a bunch of stuff to say what it's version of Python you're using and things like that and then it will spit up hope in hopefully that your tests passed so now what we've done so far this is this is what we've produced we've got a license file a readme file a setup file source directory with our code and a test you can obviously stick your tests in the test directory if you have more than test files it's good to distribute source just distributions as well as binary distributions for various reasons people can check the code before they run it they may not have access to github to access the code they may just need to verify the code before they run it when you run s test against our setup py file we actually get some warnings saying it would like some more data for some reason s test would really like to know the maintainer and maintainer email or the author and/or email so it's told us that so we can just add those in that's three lines we have the URL of the project linked to github in this case my name and my email address excuse me so now we need to test that make sure that source distribution contains all the files we want it to so it just when you run s test it just creates an in this case of gzip tar ball and we can use the tar command to unzip that and have a look at the stuff inside and when we have a look at it we noticed it hasn't got our license txt file or our test hello world file ideally a source distribution could contain everything that is in this snapshot of code so everything we're distributing everything that gets built into the binary distribution in order to add those missing files into our source distribution we need to write a manifest input file they are fiddly and annoying fortunately there's a tool called check manifest that does pretty much all of this for us or at least we'll get us started quickly so you can pip install that you can add it to your development dependencies if you like you run it for the first time with this create flag and then it will create your manifest I am I recommend having a look at it it's just things like includes and excludes lines for various files that it's found in the project that it tries to make sure that everything you have in get ends up in your sauce distribution so it's finding these files and adding them to the manifest input file so then if we build our source distribution again we can unzip it and then we see that now just out of the box the check manifest has created a manifest file that includes the files that were missing so now let's publish it it's good to publish earlier rather than later if you try to perfect everything you will really never publish the package so as soon as the point you have something useful not necessarily perfect try and get it up apart from anything else it will register your package name on PI pi to your project so you're not letting somebody else just kind of come in in the months while you're working on your project you used to be able to register your project before you uploaded code now you need to actually upload the code in order to register a name so here we run a setup py with the be distri land the s disk command - and in our district tree will now have a wheel file and our source distribution in order to push the PI PR you need to use twine for various reasons it separates the build step from the upload step which means that you can do these manual checks of your of your distribution files before you upload to pi PI otherwise it's a single command to kind of build and push up your code if you get it wrong it's going to mess things up for you so here we install twine we use the twine upload to command it also uses HTTP whereas for a while pip didn't so it's safer sorry setup stools didn't if you get to play P I the website quickly enough you will see the name of your project on the home page as the most recently updated package and that's kind of cool if you click on it you will then get to the project page you can see our readme file is essentially duplicated here there's a github link that I've just cut off at the bottom I had to change the name of the project by the way because there is a HelloWorld package on pipey ie some obviously somebody has done that so there's still some more stuff that we need to do I would recommend using toxin running out of time so I apologize for running through these I recommend using tox for testing against different distributions of Python and different versions of the libraries that you depend on here we're just testing its Python 3.6 3.7 you installed tox you have that toxic configuration file you it spits out loads of output when you run tox and hopefully at the end you get for each one of your targets you get a command succeeded and a little smiley face at the end which I always think it's rather lovely here's why we use the source directory so root directory is the directory we have I've been working in if our code was in this directory if we import hello world while running the tests it will run the code in our source directory sorry in our current directory but we don't want it to do that we want it to test installing the package and using the code from there by having a source directory you are forcing it to use the version that was just installed into the virtual environment you should also build on test machine on clean machines in the past I've used Travis for this I am probably moving my stuff to as your pipelines and depending on when it gets his stuff stabilized yes I won't talk about that any more for extra credit you can add badges to your readme for code coverage for quality metrics you can manage versions with bump version that's quite nice you can test on different operating systems you can write more documentation you can always write more documentation and tests you can add a contributed section to your readme you can implement the code of conduct and that there's there's lots that you can do but I recommend that you don't do any of the stuff that I've described the stock so there's a project called cookie cutter that generates sets of files from templates and people have already created template projects for pi PI for Python projects so if you install cookie cutter and then you run this command to download yol's it's there's a few of them out there I quite like youngins it's similar to my own way of thinking of these things this cookie cutter library it will then it will download the template from github it will ask you lots of questions because it's much more flexible than all the I've given you one option for each step he offers you lots of options different testing libraries and things like that and then at the end of it you're done in theory you will probably have to go and tweak some of these files because they won't be quite the way you want but it took me five minutes to get up and running using this process and then it created all of this so you will recognize some of the stuff in here from the tutorial we've been through there is extra stuff in there there is a Sphynx directory of documentation and with just boilerplate documentation in there at the moment but just waiting for you to fill it out you copy and in your code and then you're done so that took me about five minutes could have cut this talk down to the last two slides if I'd wanted instead of wasting all of your time for half an hour but hopefully give this gives you an overview of good packaging practice and all the things that you need to do to kind of build a well-rounded package there's obviously different directions you can move in but this is a really good core understanding of the things you should do for a professionally released package and there are other projects for distributing libraries these days they're very interesting that don't you set up py or use it in different ways I would really recommend having a look at them if you're struggling with setup py poetry is getting a lot of mindshare at the moment I haven't really used them so I can't really recommend them I'm trying to push current most common best practice if you are interested in the slides or the code for this talk they are available on this bitly link follow me on twitter if you have any questions feel free to grab me at the conference come to the next mo booth tweet at me preferably not abuse but thank you very much for coming to my talk [Applause]
Info
Channel: Coding Tech
Views: 52,823
Rating: 4.9345121 out of 5
Keywords: python, packaging, pypi
Id: GIF3LaRqgXo
Channel Id: undefined
Length: 29min 27sec (1767 seconds)
Published: Mon Jan 06 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.