Michał Karzyński - From Python script to Open Source Project

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hi hello everybody so you want to be a rock star don't worry everybody secretly wants to be a rock star so what does it take to become a rock star well step one you need to master your instrument or multiple instruments if you play many of them but there's a very important step number two and that is that you need to learn to play in a band as a rock musician you're always on stage with other people you're always playing in a team so if you want to be a Python rock star because that's your instrument then you need to learn to play well with your band which is your team of other programmers and there are tools that can help you work together play together better and those things are standards things that we can all agree on best best practices that we found that work for us the best and tools that can actually check how well we're doing following these standards and best practices so hello my name is neo Kaczynski I come to you from Poland like it was said and I work at Intel on a very interesting open-source project called n graph which is a deep learning graph compiler which I'm sure you will be hearing a lot about in the coming years but I was in my spare time doing some playing around a little bit with the open AI gen I don't know if you know this is a reinforcement learning environment where you can teach an agent to play some solve some puzzles or play some games and what I found was that there wasn't actually a very good way to find out information about environments that you have installed locally so I was playing around with these environments they're very good api's that you can use to explore these environments so I wrote up a little tool debt that is able to help you explore these environments and I'm gonna show you that a little demo right now if I can so this is just the command-line utility you just typed something in it shows you the environments that you have installed on your computer you helps you pick the right environment for yourself and you can watch a random agent play space invaders and die quite quickly but you also get in the background some information about the rewards that it's getting for as it's playing which is helpful when you're developing your own algorithm in the gym so okay I've done that and I thought this is a very small thing but it could be useful to others so maybe I can release this as a package maybe somebody will be able to use this to play around as they are preparing to write their reinforcement learning algorithm for playing in the gem so then I thought alright if I want to release this as an open source project what will it take and I wanted to get like put all these tools and best practices that I've learned from working on a larger open source project into this tiny open source project and these are all the things that I found I needed to to use so my slides will be tagged with these little bubbles and you can the first set of bubbles shows you stages so you can prepare you first you need to prepare your code for some of these things then you can automate some of the things that I will be talking about and finally you can put all of this into a nice CI environment for continuous integration the little tags with a book icon sure you like references for more information about this things you can google for and that but the tags with no i can at all are just things you can pip install the names of packages you can pip install and tags with a little external link arrow are names of services that i will be talking about which will come up on some slight so that's the legend for the next slide but okay let's start so if you want to write a command-line utility you need to define your user interface your user interface in this case is your command-line interface and the expectation of a user when he's coming to some utility is that the command-line interface were what worked like this if you type in the name of the utility with no with no other options it'll give you like this one line short description of what the syntax is a reminder of what the syntax of the command is if you want to find out more you you do that - the help thing and you get the longer description of the interface and then you can provide the options the values for the various options either with long names or short names so that's what a good command-line interface looks like and actually it's written up in no guidelines for for command-line interfaces so how can you do this well actually very easily there's a tool that I like very much package called doc opt which allows you to define your entire command-line interface just by writing a doc string for your for your script so you just write this documentation once and it acts as the help text that will come up when the user uses - - help but it also becomes the the input to the doc opt function which is provided by the library which parses all the arguments takes all the values from the user from the command-line and gives you back a list of argument values that you can use directly in your script so that's the that's the first tool I want to recommend to you doc opt but there are of course other things other ways you can approach the same problem okay so now we have a script with a nice command-line interface what's the next step well the next step is to put all of this in a in a project in a package so this the the way we the standard we have in Python for laying out code and in the directory that goes out that will go up on github is like this you the root directory will be the directory for your entire package and then inside of that you'll have a readme file set up py file and some some source code which can either go into a directory named the same as your module or better yet just a directory called source and then you may have some tests in some Doc's so that's that that's where you can put your your code okay so that's ready now the next step I was thinking about okay so and what should I do next well I have to refactor my code a little bit so it's not just one big long function that does everything but is going to be something more maintainable maybe I'll have some contributors coming in maybe they want to add some features it's good to prepare the code in some way that we can all use to work together on later and so I'm just going to mention the standard I think that we should all be following which is the clean code guidelines from the famous book by Uncle Bob Martin Roberts and the TLDR of of clean code is basically that you should just write small single-purpose functions with meaningful names with arguments that have meaningful names each function serves the single responsibility doesn't take many parameters Uncle Bob says that two is the most that you should have and preferably no side-effects and that allows you to write things that you can easily test so you should write unit tests for each one of them okay so some refactoring done let's let's get to the next step well now the good practice that we are all following is to use this construct where we're in our module using if name equals main then we execute the function this this actually does two things one allows you to import the code from from from the module in another file and refactoring the main function into a separate function will come in handy very soon when I'm talking about entry points in a second so um the next step is to prepare a set of py file now this is actual setup UI file that I wrote it's not perfect but there was a talk yesterday by by Mark Smith about writing the perfect preparing it perfect the pipe UI package so you should check that out on YouTube afterwards if you haven't seen it but this is basically all you have to write to to get a to get setup tools to package up your code and this the arrow is pointing to a little trick that you can do too if you have a readme file written in markdown just use that as the long description for your for your package that will later be available and pipe UI so I recommend doing that then if you already have a setup py file you can use it of course the basic use of a set of py file is to prepare your packages so you can either prepare a source package or a binary distribution package a wheel which you can then upload to pipe UI and but you can also use setup UI and you should be it should be using it this way during local development so you can actually inside of a virtual environment that you created for working on your project and use setup UI with the develop option to install it locally another way to do this maybe even better is to pip install - e and current directory the dot indicates the current directory where the setup py file is and which will actually just call pip to call setup UI adoptable develop through pip but I was pip to handle the dependencies as well so another thing that setup UI allows you to do is to define entry points and this is this is a very useful feature of set of tools that not everybody takes advantage of entry points allow you to actually combine multiple packages into systems where of plugins so you can have a main package and you can have other packages that that are plug-ins for that package and these things can be defined through entry points but a very simple use case for entry points is to define the console script entry point which just gives you which just creates a command so my command in this case will become the command that you can call that your user will be able to call at the command line after they install your package and this syntax here maps to a specific function in a specific file in a specific module with this notation so if you want to write a command line utility you should probably write a console script entry for your set in your setup do I file okay so next next subject that you need to take care of is requirements and this is a big subject which I will only be able to skim over due to time limitations but the gist of it is that you need to provide a way for your users to set up an environment that resembles your environment as closely as possible and the way I use requirements txt is to provide a list of specific packages at specific versions that I've tested the the package width and this is this is very useful for your users to then find out okay if it's not working for them maybe one of the dependencies is at a different version so the simplest way to create a requirements files to use Patrese and then you can install these with pepp install and I would recommend separating requirements and that you need for the actual running of installation of the package from the ones that you only need for testing because that will come in handy later when you're automating some CI processes so I'm not going to get into pipin or other approaches to handling requirements but you should look into those if you're curious okay next next best practice this is official now I think we should all use black so it's very to use you just install it and then you just run it on your source code and it just reformats the hell out of it but it does it in a consistent way so you may not like it but it's the way we the way it works is consistent and we can all agree and that's the it's a huge value that we don't have to argue how we're going to be formatting commas at the end of lines there's a way that standardized and you know let's just all stick to it so black is just one formatter that you can use you can actually have a number of them and if you if you use them then a very good practice is to use them together with pre-commit pre-commit is a simple tool that you install and then the first time you want to use it you you run this command pre-commit install and it sets up get pre-commit hook for running all of your code for matters so with if you want to use pre commit with black then the configuration file on the left which you should store in this special llamó file called dot pre-commit contract will set up will download black from the internet and prepare it for for running and then the next time you want to commit a change you type git commit that will trigger black and run it on all your files and if anything is changed by black meaning that it had to be reformatted it will prevent you from actually committing the change so it's a good useful tool for very quickly checking your formatting before you even commit the change another good way to test if you're actually following all the standards is to use cold winter my favorite is flake eight but there are of course many others and why I like flake eight is that it has this plug-in architecture that I described before so you have flake eight as the main module that you install but then there are many many other flake aid packages that you can add on to it and in this list you just have the ones that I like to use you can find others they can look for they can test not just compliance of your code to PAP eight which is of course the requirement that the standard we should all be following but also it can look for some bugs common common mistakes that are made sorting of your imports with I sword and other things that that you like to have in your code you can all be tested with these flake aid plugins it's very easy to configure you can put your configuration and talks any in the flake eight section and define some define some values like line length now because we should all use black the official line length became 88 because it has 10% tolerance for at 80 line length and a ten percent tolerance and you can exclude some checks from flake eight if you want by adding this ignore the instruction in there and now if you run the flake eight command it will load all of these plugins run your code all of your source code through all of these tests and inform you if you're missing something or if something is amiss not formatted correctly or maybe you have a common bug or a security fault that you didn't notice somewhere in your code so this is very useful another useful check is my pie and type annotations that are now available in Python 3 this takes a bit more work because you actually have to do the type like add the type annotation to all of your code but if you do it it pays off because you can do static type analysis of your code before you actually before you commit it so this will check if any where in your entire code base you're calling something with the wrong type of argument and this can sometimes find bugs that you they're silly that you really didn't mean to do but somehow you put the wrong you're calling a function with the wrong with the wrong variable for example and normally you would have to find that somewhere and fix it but my PI can find these types of issues for you very quickly and without even running your your code so use my PI for for this purpose if you if you have the patience to put the type annotations everywhere but I recommend ok so now we have some checking how do we put it all together well the tool that everybody's recommending these days is talks and talks is very simple to configure and it can put all of your tests together into one thing so the a simple talks configuration is written in the Box on the left and it defines a list of environments that will be tested in this case python 3.5 3.6 3.7 and then the definition of the testing environment the dependencies the commands we want to run and even some other configuration sections can be put all into this one toxin file even for other tools so with this set up all you have to do is run the talks command or if you want to run just a single environment you can run the talks command with the - each option and the name of an environment and it will start by creating a virtual environment for that specific version of Python installing all the dependencies into that environment installing your packaging up your code and installing it into the virtual environment and then running the commands so all of your tests like here I have to lay cake and pie tests but you can build on top of that all the tests that you need can be run with one call off call to tox so that's very useful and it'll come in handy in a second when we're putting this all in in a CI system but if we're going to be testing things well we need to write unit tests so this is where refactoring of the code into small functions comes in handy because now you can write simple unit tests for each function and the the test tool that I think is becoming more and more popular all the time is PI test it's really easy to use you can write tests with minimal boilerplate just just just import your function run it and put some assert statements into a test and you have a test and that's all you have to do so it's easy to get started and then you just run all the tests with simple call to the PI test command okay so now we've got all the code preparation all the code is prepared everything is done we're ready to share with the community a pretty robust project so what do we do well of course we put it up on in the git repository these days github is king but get laughs of course is a popular alternative and and there there's bitbucket as well so I'm not going to do recommend just github but it is the one that has best integration for all the tools that I will be talking about from now on so you just set up a git repository put all your code all your code to the repository and push it to to the repository if you're creating a repository remember to put a git ignore file into your git repository and the license the license is the thing that's easy to forget but it's critical if you don't put a license in your code no one can use it so put put the license set up the git repository and then you can proceed to setting up a continuous integration environment I like to use Travis but of course as with everything there are alternatives that Travis is easy to use because you just prepare a simple another llamó file you just drop another llamó file into your repository and when you put one like this that calls talks that one talks that one call two talks will run all your tests so if you if you do that and you set up at an account on Travis and add your repository to that to that account the testing will start and you will start seeing these little checks on all the PRS that you make to your repository which are very useful even for yourself when you're writing code you can go through the PR process of your own changes and see if it passes all the tests another useful tool that's available for free for anybody who has an open source repository on github is a requirements updater I like to use PI up bot specifically for Python requirements but there's also dependable which is free for other languages as well so there's no configuration required you just set it up by creating an account and giving it access to - to your repository and then the bot will scan your requirement file and figure out if they're up to date with versions of on pi PI if they're not then it will start creating pull requests with updates to specific versions of packages and if you have a CI process in place then you will know which ones you can merge and which ones you can't because the ones you can merge are the ones that have a green checkmarks and the ones the tests fail for will have a will have a cross so that's very useful okay and other useful things another useful thing is to check your test coverage the PI test library and other Python unit test libraries can actually check which lines of your source code were hit when running your test suite and then give you a report so this is actually very easy to use for pi test you just add this - - call the option specify your module and then you'll get a report for him for your module if you want more information you can ask for HTML report and that will generate a code coverage report in an HTML and HTML files which show you exactly which lines of your source code are tested and which ones are not being tested so you can so you can see where you still need to add code there add tests and then the you you can actually integrate this with another service online which will track the test coverage over time and may even prevent you from merging changes which decrease the code coverage on your in your repository another thing I want to mention is code review if you're working as a team the best thing you can do for each other is review each other's code the people you work with have the chance and you have the chance to tell them which part of the music they're playing you really like and which one you think should be a little better and that's the moment to do this is during code review but you can also there are also services now and I think they're getting better although they're far from perfect yet that do automated code review so you can sign up for something like CODIS CODIS e or code climate and it will actually look at all the PRS in your repository and find things that may be wrong with this code and give you making code review on your PRS okay another but you can employ is something that will automatically merge PRS so merge defied IO is one that I recently set up and you can configure rules that apply to your PR and if a PR matches these rules then it will get automatically merged for example this is a configuration for automatically merging a PR that has past CI and has at least one positive review and if your PR matches this then merge if I will merge it and you can even set up a different rule that will delete the old branch so if you have all this in place you actually have dots working for you the pipe bot will find updates to packages on pi PI Travis will test if these packages are our passing your tests and if everything passes merge if I can merge these PRS so without you doing anything you can have your project be kept up to date with its dependencies of pi PI so I'm getting to the end of my story now you're ready to publish your your project on pi PI this is very easy we have a tool called twine and so once your packages are built you can upload them to twine if you just need to set up an account on PI API and your package is published and you're happy and everybody can use it so I rolled up all the details I know this went fast but everything that I said is in an article on my blog so you can read it at your own pace [Applause] thank you so much me ha for this very interesting talk do we have questions for me ha ha so yeah I haven't set up an automated documentation for this particular project because there isn't much documentation but I'm torn between Sphinx and make dogs I'm a fan of markdown so I don't like restructured texts which makes me bias against against thanks but I think spanks is very powerful and I've seen it used to good effect by people so I guess in the right hands okay do we have another question for me ho yes let's use the microphone for the recording recently I worked in an automated versioning and I came to a bump to version together with a script that I made myself do you know some tool to do it in an automated way or we have to do it ourselves at the moment that's a good question and I don't think I have encountered a tool that actually does this so so far I've been doing it manually but it would be a good good thing - yeah thank you we have one more question over there I'll come to you do you know if there's a way to instil pre-commit globally or as a git repo template because my experience as people forget to install Pico metrics when they start new project well if they forget to install pre-commit they will pay for it during after they commit because they all the the PR will not pass your tests so they it's it benefits them to install it so they'll have motivation to do it at some point do we have more questions raise your hands okay so if we don't have further questions for me how another round of applause for me how [Applause]

Info

Channel: EuroPython Conference

Views: 14,904

Rating: undefined out of 5

Keywords:

Id: 25P5apB4XWM

Channel Id: undefined

Length: 32min 36sec (1956 seconds)

Published: Mon Sep 23 2019