Emacs + org-mode + python in reproducible research; SciPy 2013 Presentation

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
I'm an assistant professor now in chemical engineering at Carnegie Mellon University and this is my first time at Syfy I'm really impressed with the way things things are going and interaction so I want to talk about some work I've been doing in this and I want to motivate it by a couple of problems I've been trying to solve for for quite a while so one is I have say computational research workflow where I set up a lot of calculations a lot maybe hundreds or thousands and that's typically done in a loop maybe in a script called periscope I then I'll write a script that runs those calculations and inevitably one of those hundreds will have some problem and I write a second script to fix that problem and so I get some sodium tungstate three script and maybe some more and then I have analysis scripts to do various things make various plots you can see them all kind of highlighted there and then what I might want to do later is try to teach one of my students how to do this but then I open this directory and it's you know all of these files that I have to try to figure out or I try to repeat this myself so so that's been one kind of problem is how do I document all of these things that I've done and even that my students have done trying to figure out another problem is as I'm writing I might want to integrate some math so here I have an integral that I'm defining and then implementing in code and then I have some output down here and I want to document this for some technical document for a manuscript for a blog post something like that and what I don't want to do is have to cut and paste code from a script into this document and then run it cut and paste some some output to me that that's just tedious which means it's likely to be error-prone so I'll change code and forget to change the output or something like that and because it's tedious I'm also not likely to do it so at this content would never even get generated in the first place so this is something I want to be able to do all the time so I've been trying to figure out an easy way a non tedious way to do that and the third problem is I'll call it how did I make this figure so this is a figure from a paper we wrote a couple years ago I really like it but I might you know where did I where's the script that wrote that paper where's the I wrote that figure where's the data from how did I how did I make that data there's there's obviously a lot of data in it and how would I include that data in another figure or another paper right so this is already a few years old and hard to find and so that's a problem that probably many people have that I've been trying to solve and so when I think about these things these problems all have related solutions right number one documenting computational workflow I could solve that if there was a way I could have a document that had all of the commands all of the scripts that I ran in there organized in a way I knew how to read I could run that from that document and capture the output that would really solve my first problem the second problem would really be solved if I could integrate an intersperse text and data and code all in the same document where I could run the code and capture the output right in the document and the third problem of course would be solved if in the manuscript itself there was actually the script and data that ran to make that figure and so that's what I'm going to tell you I have today a solution for all three of these problems the solution involves an editor the editor has to know code data and text it has to be able to interact with the system so any editor with those properties could be used the way I'm going to show you you need a markup language because you have to be able to differentiate code from text from from data and so I'll talk about a particular markup language that we use and of course you need convenient programming which for the most part is going to be Python but it isn't specific to Python and so I'm going to talk about Emacs plus org-mode plus Python as the solution to these problems and show you how these things go together so let me just give you a nutshell Emacs in a nutshell I'm probably doing it a great disservice it's a very old editor it's a it's completely extensible extensible in a full programming language called Emacs Lisp you can customize almost every aspect of the editor so you can add features you can you can do anything you want it operates in in modes that provide features so if you're in Python mode then it knows how to indent Python code it knows how to run Python code it knows how to do many things with Python if you're in restructuredtext mode it knows the restructure text syntax and it helps you write it if you're in law tech mode it knows law tech syntax and it does all of these things for you but more importantly than those things many editors do that what Emacs also provides is really complete integration with the operating system so it can access code it can execute code it can make code it can compile code it can capture the standard out in the standard error of running a process and it can insert that back into your buffer and so that's an important feature that an editor needs to have to do what I'm gonna show you how we do now how many people know org-mode a good fraction that's good so org-mode is a a major mode in Emacs it's really kind of two things on one hand it is a syntax it's a markup language like markdown or restructured text so there's a syntax that tells you how how to make different kinds of elements how to make headlines sub headlines how to differentiate code from data etc it's primarily well I shouldn't say primarily it's an outline mode I'll show you all of these things that it does it does task management and what's interesting about it is you can embed arbitrary lotta arbitrary HTML you can embed code in it and then you can execute execute the code right in the buffer and so you can do other things like make navigatable links and then you can take your org-mode file and export it into some format that you like if you like la tech you export it as la tech if you like PDF you export to la tech and make a PDF HTML you can export it as markdown whatever so for example this presentation I'll show you was written in org-mode and you can embed files in this PDF and this is what the org-mode looks like for this particular presentation so we are right here we expand this these are all of the bullets that are on that screen and to build this I just run a command down here I just click on this link here and it runs that command and it exports the Beamer PDF and generates the slides that I'm looking at okay so some examples of things you can do these are simple examples on the top I have a single shell script that lists the contents of a directory pipes it to sort I've captured the output below so you can see what directories and files are in there this is a shell script this is a pythons - code so the same thing import OS list the directory sort the files print them this is syntax highlighted you can also do Emacs Lisp if you like to do that so you can embed all kinds of languages into this file and get what you want captured in here so none of this is cut and pasted I literally typed the code in and then ran it alright so I'm going to talk about a couple of projects beyond these toy examples that that illustrate what we can do with this so one of them is called PI CSE so I'm a chemical engineering professor I have to teach my students how to do computational and scientific and engineering calculations and I'm trying to do that in Python and so I'm going to talk about the PI CSE project that I wrote completely in org each all of the sections and that go into this Python blog so each blog post is written in org-mode and and pushed out to my github blog a project called DFT book where I have I teach a class in quantum chemistry and molecular modeling using Python to drive those calculations and we're also now actually writing scientific manuscripts and exporting the manuscripts in org-mode okay so so the first thing I'll show you some screenshots so this is what an organ alike in my Emacs buffer this is the collapsed view if you look down here you can see there's something like close to 30,000 lines in this little collapsed view so each of these headings is like a chapter you can expand it out and you get to a subsection here where you can see the the title this is information about the blog post that's been made you can have inline rendered text in your in your buffer here's some code and the matplotlib figure that's all in line that shows in the buffer so all of this is and executed in the buffer every line of code here has been run the output has been captured so I know they're always consistent in there so I can export this into a PDF or into HTML those are all hosted on the on the github site I've played around with some things like mobi and ePub those are not quite as simple to do alright another example in trying to understand how you want to integrate yeah yeah yeah No well so no the answer is no there there is a whole export engine that knows how to handle all of the or delhomme ins so each one of these stars here is a section and each one of here you see three stars that's a sub subsection so it's automatically exported in a fairly simple way that that's pretty transparent I'll show you what it looks like before we finish okay so here's an example where I try to explain to my students how to perform a particular kind of calculation where you want to calculate say the absorption energy this is a schematic of a surface molecule it absorbs on the surface down here is Python code that shows you take the energy of this minus the energy of this minus the energy of this here's the captured output so when students try this they should get exactly these same numbers if they've done it the same way that I did in there so this is some again 300 ish pages of code all of it is there's no cut and paste code in here at least n no cut and paste results so another interesting thing we can do is we can have code that generates data so up here is the tail end of a function that printed out this table this table has a name composition - volume and I can use this table as a data source in another code block so if I want to have series of things going on then down here I use data equals composition volume and I can extract out the columns of this table and make a plot over here so I'm gonna take a brief minute here and and do some demos so again you can have clickable clickable links here's an example of some Python code just to holler out to n thought you can see I'm using the N thought canopy Python distribution directly from Emacs and you just type you put your cursor in here Emacs knows that it's in a code block now and I just type control C control C and that's that so I can also look here this is the Python computations in science and engineering this is the short collapsed version that has 26900 lines if I go back to my buffer here you can click on this if you want to see the actual law tech you can render it right in the buffer to see if it's if it's correct down here is the code that we used and then this is the figure that is there and this gets rendered into PDF like so so here is the PDF you can look over here and see the code there's line numbers in here so you can refer to these the code gets the math gets rendered nicely and down here is is the actual figure so I mentioned that you can export this to HTML so here here is Emacs Lisp there's about 275 lines that parse the org file and generate HTML that is compatible with the blogger file blog system so what that does is is just look through here it gets the title from from headings it generates the yamo heading and then it generates the HTML and then it copies the HTML to the right directory and so all of those posts get put here on this github site and so you can click on categories for example I like this first heat transfer example so all of this was done in org mode and then exported to github and so you can see the math to put the Python code different versions of solutions and even animated gifts that show how the solution works all right so I want to show one maybe two final examples of things that are interesting to me about this this is a link a manuscript that we have recently written and submitted so they're interesting things like these links that our site colon something if I click on this it opens the bib tech file directly in the place that I want so this is why I call Emacs a browser for text because I can do things like that the supporting information file is where this really shines here we can illustrate how to attach file so we have an excel sheet that's directly embedded in here and we can use that data in line so here for example is a table we can use that as a data source for a Python script somebody mentioned earlier at the image competition that you tend to spend a lot of time customizing your your figures so that's what happens here and all of this I'll just skip to how it gets rendered in the PDF you can see that right here might be an excel file that you might want to double click and have that so this will be a supporting information file that's generated from my org file alright so the last thing I want to show is this DFT book so this is the thing that I showed you before one of the reasons why you might want to have a deeply integrated editor is you can do things like create a little menu here and so I'll just click on get to do agenda which is some code I wrote that finds a bunch of all the elements of things to do so here are I can click on this one and it'll take me to the place where I have tagged something that I'm supposed to do there's something wrong with this script is a note that I left myself and so I can do sort of task management at a fairly low level here all right and that can be rendered out to something as well so I think I had a minute to go about 45 seconds ago so I'll just wrap it up there let me go back to my presentation and talk about some brief challenges in this approach so one challenge is you have to use emacs there's just no way around it org-mode is a markup language and a functionality that's written in Emacs Lisp before Emacs other editors could mimic some of the capabilities restructured text and Sphinx is pretty close it is extendable but it lacks some of the editor features editor integration and getting this exported format perfect you know journals want perfect blah Tech in their format so that could be challenging but I find that I actually don't care about the law tech anymore I just want to read it in work mode so that I have the easy navigation I have the interactivity but my students like to read HTML and PDF so let me just conclude here I think reproducible research in general is going to need new tools new workflows I tried to show you some of the way I've solved those problems they're going to be for me tools I can customize this was really a game changer two years ago I started doing this and it really enabled all of these things the last one being one of the most important managing my sort of daily responsibilities and the key features that enable this are extensible editor extensible markup language and scripting and this these slides are available at this github repo so thanks and I'd be happy to take questions so it is plain text alright this is just plain text that's in there you can look at it that way you can it's easy to integrate with get because it's mostly prose you have to think a little bit about whether you want every sentence on a single line for example so you get better good diffs if you write normally then you you have one long line that wraps in a editor and you're diffs don't make sense but if you're not a diffuser then then it's just fine mm-hmm yeah so so the question is can you debug the code so this whole thing is in org-mode so you can with a eight command get into a temporary buffer that's Python mode and then you can use whatever functionality that Python mode has to do debugging or or whatever so if for those of you not familiar with Emacs down here it says Python so this little buffer down here is now in Python mode and you you probably could I'm not a night Python user I tend to you know for for writing applications for writing books like this I tend to have such small snippets as examples that it's not necessary but there is an i python integration that I don't use hmm yeah yeah so I have the opposite problem I'm the advisor I just forced my students to use or cooked so I think the answer is it depends on what both parties are doing any editor can write org-mode right it's just plain text with some fairly simple markups so if you were willing to kind of take the burden of you know fixing it up that's correct so far that's correct yeah it's it's like using word if you want to collaborate and writing a something in Word everybody just about has to use word so it's here yeah so typically I have this is sort of my dashboard in org-mode and so these are current projects that are all links I can just click on them there's something in org-mode where you you can have a directory full of org files and that's your agenda directory so it will scan through all of those four four things to do but that's say those are sort of the two ways to do it is you just maintain a dashboard or so I have this little quick command down at the bottom it gives me a list of like the six most common things that I need to go to from anywhere all right thanks
Info
Channel: Enthought
Views: 46,006
Rating: 4.9713264 out of 5
Keywords: Python (Software), SciPy, Emacs (Software), scipy2013
Id: 1-dUkyn_fZA
Channel Id: undefined
Length: 21min 16sec (1276 seconds)
Published: Tue Jul 02 2013
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.