DjangoCon 2019 - Jupyter, Django and Altair - Quick and dirty business analytics

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
it's working yes okay hello everyone my name is Chris Adams thank you very much for giving me your attention for the next 25 minutes if you've ever been to Django come before you may remember me from Django corn Europe in florence in 2017 when I spoke about Django and climate change and if this interests you I'm running a workshop tomorrow to apply the ideas in that talk to your own project called greeni or Django project you'll if you've heard the name Chris Adams before in the jungle world there is a I this happened last night oh you're Chris Adams I use all your stuff I'm suddenly that was that guy not this guy right I am the less famous less for known instance of Chris Adams in the django community and I now believe I am doomed to live in this man's shadow one day we might meet and I suspect if we touch women end up in aisle a ting each other in some kind of weird antimatter explosion but I digress today I'm here to talk to you about Jupiter outer and younger and I think I'm talking to you because there are three interesting projects and for the next 25 minutes or so I'm gonna split this talk into three main parts where I'll talk to you about notebooks share some useful theory about visualization and then show you how to apply this in a Django project so are you sitting comfortably then I'll begin okay so let's talk about notebooks first first of all though I feel compelled to congratulate you all on your life choices on choosing to learn Django you have chosen to learn what is very likely the most popular web framework for the most popular dynamic language in the world with the largest healthiest community ecosystem go you and this is about as close as winning a lottery gets when you're thinking about being a developer because if you can think of a problem the chances are that someone's actually working on it and they're doing so with a stable non shifting ecosystem and a well-maintained standard library but things are changing though and Daniel Daniel a policy decides something to me earlier on this year which kind of caught my eye he you mentioned this this survey from the Python that the PyCharm survey from 2018 where they basically surveyed more than 20,000 developers and from 152 countries to see how they're using - and this last year 2018 was the first year that people are using it for data analysis more than the web and I think it's fair to expect more of this and as people who work with data analysis first come to the web and as we as developers increasingly need to think more about data analysis ourselves I think it's worth looking at some of the tooling that they use because it might be useful for us and if you've worked with Python for data analysis the chances are very high that you will have come across Jupiter or at least heard of it because it's basically the tool that everyone uses now let's look at these in more detail so when we look at a notebook we see a few interesting things we can see some mixed-media there's like some markdown and some mathematical formula here we can also see some code snippets so there's input here and then there's some kind of output and there's all these kind of slidy widgets which make which suggest that they're interactive but when I look at this and this is the screenshot from their own home page it's not obvious to me what they're for yet so the term that is Jupiter Jupiter project team use so they use narrative so like these are good for making narratives so what's a narrative right I mean let's unpack that the main thing about a narrative that you think about is that there are kind of four things that make it a narrative so narratives are collaborative in that you might write one with the expectation that others will run the same code as you or maybe tweak here and follow along with it they're shareable and then they exist primarily in a browser so that if you want to share with someone you just share a URL and they're publishable in that the notebook itself shows your commands and also shows the output like the return values from a function but it also and it shows them together and it also serializes them into a kind of notebook format that you can actually publish on say an s3 bucket or online and so on and they're reproducible in that once you've seen the results it's possible to kind of run the entire book all the way through to kind of see if you can get the results yourself which is why they're so popular in academic contexts because they help solve some of the kind of reproducibility crisis that people that we that we're struggling with so this might feel a bit academic so like what do we what are we using them for in the real world so I'm not sure if I can get over to this let's see where is it oh there it is yeah so The Economist uses these they have all these kind of cool visualizations and I'm not going to try and explain purchase power parity in five in in a minute personal life on my budget but basically they do with these cool vids and like what you're saying now is basically as an English person what's happened to our currency and how it's seeing how we've got poorer over the last few years right and you are think okay that's interesting but what they also do now is they share all the source code for things like this so they basically this is what's happening and here's the actual source code so you can actually trust what we're talking about so this is quite common in data journalism now what you also see is oh right now who's a Riley as a non use like sign up for like the Riley Safari thing or anything like that show hands okay if you if you're with the ACM it's like 100 euros a hundred years for the year told you worth doing but what they do they've got some really cool stuff so they have a date they use their building on top of notebooks to build things like this so this is Peter Norvig who is a well-known pythonista and he is talking about how he codes and then he'll write like an example of him solving a problem and then you can jump in at any point to the code and try running the code yourself and like if you had sound you'd hear it saying but the honesty isn't that much that it's really useful here but because you can see what he's doing you can then kind of play around with this and then you can basically run arbitrary Python on some service somewhere which aren't yours which I used for some cheap gag like this but you could basically see that okay yep that's being run so if I wasn't actually just doing something here I could actually see what's happening here and he'll come up and then BOOM things come back so you can actually like interact and like experiment with stuff as you as you work through this if your work in DevOps or anything like that data dog incorporate this into their platform now so whenever there's an outage or if there's a run book you can actually say intersperse read Me's and write ups with what queries are doing and what's coming out of this to see what another so you can like see what so other people can see what you saw at the time and why you made a decision or why you might make another sit decision in future and Netflix are really really really big on it on notebooks so they've built all this tooling around it too point that like there is a kind of thing would interact now which is a really easy to install kind of electron wrapper around Jupiter notebooks they use it for kind of ad hoc analysis and they connected to like all their big data pipelines but they do some other interesting things they've run them on cron jobs so instead of having cron jobs they're on a notebook to do a load of work and then I'll basically show all the results of what they're doing like in context and they store everything in a massive like s3 bucket here now you can use all these things because pretty much every single thing that you see here is oh is open source and that's really really useful but they want something like 150,000 of these notebooks every day to do different kinds of analysis and thankfully we can use some of that code it might be worth thinking like how how is any of this possible there's a clue in the name so Jupiter is a polyglot project and it came out of these that used to be called an ipython notebook basically jeux stands with Giulia like the high-performance programming language Python I don't need to explain that to this group here and Jupiter is for our stats which is the kind of previous very very well unknown tool for doing any kind of statistical analysis and the reason this is possible because we've got this kind of diagram here that might be asked using a browser here there's kind of a notebook in the middle and that doesn't do that much work itself but just for it just passes on work to a kernel that work that kind of executes the Python or R or Julia or whatever you want and then it keeps a note of what gets returned when you run a function and then it writes that into a notebook file that you could that can be shared and this means that the kernel can be written can be anywhere in the world and can be written in any language so you can be like a huge compute cluster on say Google's cloud or it can be a running Django shell process so you can interrogate your own at your own Python application of your own Django application and this means you can access something through a browser rather than just are using a terminal and that's the other thing so when we have a terminal we have all these things available to us that you don't necessarily have in it that in just text because like text can text is useful but it's useful but it does have limits when you're trying to convey meaning or kind of compress information into a particular space so there's an example you can do this but please don't write if you were to kind of write so you can basically change like the test remodel method to kind of return things back and because of working with height and three we can do stuff like this now now I'm saying you can do this not that you should do this because there's always all kinds of reasons why you wouldn't and if you did do this then like well this because this is actually a fairly dense way of communicating information about this plot this thing that you could actually take into account based on the model and so on so you could have a scenario where maybe you're working in the show and you might say okay I do this and then the output might be something like this all right or if you try to do something like I say on a list comprehension let's see would you want to get back something like this now don't do this all right it's only as I'm mainly sharing this with you as an idea of to kind of get this idea of having multiple ways to represent an existing data structure and some ways can be more information the dense than just having plain text all right and this is the kind of thing which is why don't--why want to talk to you about the notebook parts because if you've got a whole browser you can do things that you couldn't do before and I'm gonna try and find my pointer again to move it over here yeah so if you've got data which lends itself to being say is that we're gonna work yeah if you've got data that lends itself to being kind of tabular rather than showing like an approximation my table we can show a real table right or if we've got say a bunch of dicks like this bit dictionary that is then yeah you can actually then try representing that in a kind of more kind of webby fashion so you can explore staff and see what it looks like so like there's all these things you can do when you've got a browser rather than just at a terminal and depending on like what the data is there are other things that kind of that allow you to kind of I guess represent something in a way that's more true to the underlying data so this is like a G this is what you might do a geo jason in a browser with with notebooks right because you know it's spatial you can show on a map and this animated gif is basically showing Jason on earth but also Jason on Mars we're using some kind of map tiles but the idea is that you're there there's more than one way to represent a data structure and I think this is actually a really useful idea to hold on to um another thing is when you start working with vision think oh wow there's all these things I can do it's very easy to get that if it's wrong and like for example when I think about pie charts I assume they add up to 100% and when I look at this I'm not sure that it does right so this makes it harder for me to understand it and it's as pythoness this if our main job is like writing code it'd be nice if we could do something like a nickel tape install of his knowledge right and at this point here I'm going to talk to a little bit about some theory because I've been having to explore some of this weather-open work recently um if I if you had to just buy one book for the next 10 years to help you understand visualization better I suggest it's this one here by dr. Tamara moons now her book was the life-changing for me and it's common to think like you read a book and think oh that's cool and that's cool um this was the main thought I had when I was reading this talk I went when when I was reading this book and I've linked to a video which was an hour long which basic presents' all of her ideas or all the ideas in the book in a really really nice format and I seriously it's really really changes how you think about visualization and presenting things and the key things that the key takeaway from that talk which I'm going to kind of run over quickly is that we have way that data in the world of database we've got a lot so that lots of similar ideas that map really nicely to our concepts so we've got things like say in database you've got like data types which a bit like our data structures and then they might come in different shapes so you might have tabular ones you might have links and graphs or trees like this stuff shouldn't be that this should feel really comfortable to you if you used to working with data and then each of these data types have items inside them have attributes just like we do all right and these attributes can come in different flavors so we've got categorical things and then we got which are like a different kinds of things and then there's like ordered right maybe ordinal which is not necessarily a kind of discreet scale but is more it is discrete not continuous and like quantitive you see like there's different things like this and once you've got some items in a data structure with attribute you might represent them as a mark on a page or on a screen and these marks might come in different flavors so you might have like points and lines and areas but and then you will encode information about each of these marks to convey meaning using some of these channels so you had different different channels available to you so you might have things like position being one or you might have color being another or shape and you can use these in combination to basically can you convey a greater amount information in a small in a limited space and just to make this feel a bit more kind of comfortable I figured I'd share some examples of this so on the left hand side we have a mark which is a bar chart so we got a bar mark and we're encoding information in two parts we're using the exposition for this part here and the length but likewise we can do we can change the mark and convey the same information as dots but if you wanted to encode more variables in this you might choose to have color to show something else about this likewise if we wanted to encode size then we can once again in the same space encode more amounts of information in the same amount of space and at this point here you might think well okay this is cool but I don't know what if I how do I make how do I know that I'm doing this right how do I know them actually using the correct kind of channels to encode the correct kind of information the nice thing is is that people have been thinking about this for a really long time and they've actually been testing this kind of stuff with various tools and like there are helpful tables like this which you can kind of check and this helps us understand why bar charts are often so popular and so effective because they're basically positioning things on a common scale usually from in a series of unlike from left to right and we might use different colors to explain why they're different and like this is actually quite useful because it gives us a kind of cookbook or set of things that we can refer back to if we're trying to find a way to communicate a dense amount of information to people who will often either not either distracted or they're working on a number of other things or they just don't have that much time so we might think ok how do we use this this is nice but I'm just sharing like academic theory with you so the nice thing is is that all this kind of really all this thought like really solid foundational stuff is encoded into a library could Vega light and all these diagrams here basically use that vocabulary just shared with you now so you've got some really cool wizzy analysis and everything like that but there's basically a kind of JSON data structure to describe these that you can actually basically display and I'll just show you some example of it so you can see what it looks like so we've got a simple bar chart here which is showing rain in Seattle we've got a data set coming in and then we're choosing to kind of represent that using a bar mark that we had before and then we're encoding in X along here we're including kind of the time and then for the amount of rain we're using that precipitation which is inside this and we're showing that we're saying it's quantitative and that's how we can basically end up with a fairly simple diagram like this like this is cool but we can do more so we can have say two marks on the same chart to show us things like say the mean amount of our precipitation or more bases and stuff here right you can add more information into the same space and like this is cool but I'm now making you haven't you need to fight think about Jack about Jason and JavaScript and like I'm at a Django conference where we're more comfortable using Python thankfully this is actually basically done in Python this it'll be really nice if we could have something like this right we want to import a URL to show something and then we could just use all our Python library to say well I'd like that and I'd like that and this is basically outer is some people have taken the idea of vaguer and vaguer like with all the solid theoretical underpinnings and they've basically written a nice Python wrapper around this stuff so all the cool wheezy charts that I'm not showing you but on this page you can make using Python now and I'll explain what it's doing under the hood because that might help give you some kind of understanding here all right it's called outer and what outer is doing is basically working with we were using a DSL here which spits out some vague alight here vague and light then creates another another kind of more kind of dense version called Vega and then we start working with d3 who's in on if someone worked with d3 here ok how many of you enjoyed it yeah right and then that compan can buy down to SVG or for performance you might want to have canvas so this basically lets us play around and express information that we actually have available to us in a very very dense form without having to understand its entire stack but if we do need to get help working with the stack we're using common well known popular so it's possible to actually get help with it with this and I just want to say like thank you Jake I don't think I'm ever going to meet this man but he's worked been working on this more than anyone else and I think that people if you're working open source it's really really useful to actually acknowledge this stuff come up the Wi-Fi there Walker has been awesome today right anyway so now we're talking about applying this but what can we do with this so you remember how I was talking to you about this diagram here where we've got a browser here and then have no big server and a kernel doing some work well if we were to implement is tonight Wrangler applications it would look a bit like this so we've got a smiley face here speaking to a browser and then we'd be using our tear to generate some kind of Jason that we'd render in the browser and then we might have some ORM or something maybe SQL alchemy now that we've discovered it does all these of all these cool things and it might look a bit like this so we've got like some basic model here we might say to do a call like this and we'll convert things until get our values back rather than the objects cuz it's easy to work with and then we'll basically say well please take all the this this list of objects and put it into a query into a list of kind of dictionaries and we can work with that we pass that in and then we didn't code using the marks that we wanted and then we just return it as a JSON response and then in the in the actual page that we'd be showing we'd have what we'd ever euro our URL so gonna hook this up so the next page makes sense but we didn't it with something a bit like this so we fetched Vega Vega like these bits here and then we might have a div called the face that we will replace with it and then we just fetch our thing and then we say please please do the to string or to vis method on the JSON that we get back and then that's basically it that's how we can basically work in Python use tools we're comfortable with to explain ideas and explain things visually with people and actually make it available to people on the web and I'm going to try for a quick and dirty example now because I did use the word quick and dirty in the talk title all right has anyone heard of draw down here at all okay cool we've had two or three people put their hand up this is really really cool because okay climate change as a thing and we should probably be thinking about it more as professionals and these draw down is basically a project where people have looked at the 100 most substantive solutions you could apply now there's all kinds of questions about like the assumptions made in this neoliberal kind of framing of this but there is a lot of interesting stuff inside this and they do have a kind of ranking of all the charts of all the things that might or the kind of interventions that we could actually do to well I don't know stay in a inhabitable world right and I'm gonna show you what some of this looks like with those cardiac code examples to kind of prove that it really does work and but before I do that though I'm gonna show a screenshot so in just in case' demonstrate the demo gods didn't smile upon me so let's see if we can go this and grab it now okay can you see swimming toning up yes you can cool right so what you're looking at here is basically the code that I showed you a minute ago right and if I can zoom am I able to like increase the size of this oh god what have I done there let's close that all right yeah there we are now yeah so this shows us all the things what you might want to do right and we might think of things like okay like electric cars are gonna be saving us or anything like that but when you look at this you can see that electric cars aren't actually that much of an intervention right the biggest single biggest thing that looks like it's going to actually bias time is actually fixing how fridges are disposed of because they release these CFC these these gases which are really really really bad news for therefore when you're in the atmosphere but you also see some other things like some really big things here food waste which what we were speaking about yesterday that's a massive lever so is actually diet which is which came up a few years ago but you can see down here like two of the biggest things actually if you combine and they're probably the biggest thing like treating women with dignity really really helps it turns out because it changes how families work and there's there's all this things like say if you have access to family planning and give people access to to choice and then though the way that families grow ends up being different as well but there's loads of interesting stuff which we wouldn't have seen if we just looked at like say a single kind of CSV file and you can explore this stuff yourself and there's loads of other ways of presenting this and it looks different many because I wasn't sure if you could actually see it on this on this on this screen so I'm just going to try and close this now and use the last of my time to say is that right yeah that worked is that if you do care about this stuff I'm working with a company who basically paid for us to work on and find out this kind of stuff they called spend Network they're hiring and they're looking to use these ideas or basically take the last ten years of open data to find out how public's money is being spent and basically see where the inter where the biggest libras are in terms of climate change so you can do something there too they're hiring so please do speak to me afterwards but I said I'd talk to you a bit about notebooks give you some theory and show you how you might apply this theory in your work so that you can actually present things in they're kind of visually arresting and interesting and if and dense way using these tools Jupiter outer and Jango and I think I did that so I'm going to say thank you everyone yeah by the way I have this opinion this weird opinion if we're professionals then we should think about the harm that what we do we think about the harm caused by how we work and we should try to minimize it and I think that if you're not doing that I would argue that you're not really being with that professional and I'm running this workshop tomorrow to kind of explore that with other people because yeah the biggest leave we have is the fact that pretty much everything we build runs on fossil fuels right now and like this is such an easy thing for us to fix so yeah if that interests you and like the hoenn of continued existence of humans interests you please speak to me or come to the workshop tomorrow and this deck is online and the code that I showed you the Django app that's awesome online so feel free to fork it and you can see how you can actually use a ipython to speak to a Jupiter notebook and have some fun with it okay that's it I think I've got type of questions yeah I don't know we have about four minutes for questions so we can do Gengo Khan hashtag jingle Khan QA online or you can line up for questions there's always a question thank you for your talk it was very very informative it's great to have a library that provides you with the means of doing very different kinds of data analyzation and visualization but do you have any resources on how to choose the correct type of visualization to provide because that's pretty tricky yes there is actually a really good resource from the Financial Times they do actually list this i'm typical finder right so financially so Financial Times have this basically list and it's also using the Vega and Altair tools right so basically this thing was produced by I think financial time saying this is how you should use our charts and they use it internally and they shared it because it makes them look cool and then someone has taken that idea oh that's kind of cool maybe I can implement that in Vega which means we get all of that so how cool is that that's my answer i'll feathering through thank you for a very interesting talk you've shown a picture of their Netflix architecture yes and you don't have to scroll back even though most of those things as I said open-source they are probably on a different scale so I'm wondering how can you use the coating to paterno books without like running a kernel because like I reuse it in Python without not the overhead of the kernel but like I don't I don't have one to colonel running on the servers and so on okay so the approach I take and the thing I've been using if I can share this because it's gonna work I'll check oh no it's not if now we've got we know we're at this conference so I was going to show you an example of some code in Jupiter so I'd be using in a kind of scratch session like I would be using an eye Python then I take those bits and put it into a class and then call methods that way that's the kind of like working in the rebel approach that I guess the closure community are known for and a few other ones are so you don't actually need to be running all this stuff you can just actually work out the bits of Python that you care about and then call those online I think pretty cool it on the server this function if you really wanted but now we know there's more to it than that so yeah that would be my op I'm happy to talk in more detail afterwards yes sir thank you what is the best way to make the notebooks available to other people because you said it's just a URL but like to make it not publicly available on the Internet ah there are a few ways so if you use vs code however you use kind of most IDs now they've realized that notebooks are really really handy and because most lots of the kind of cool new editors have a basic running on Chrome you can basically use that stuff internally so you can actually export things as as Python but also Google have a thing called collaboratory which is free to use and it just gives you some magic kernel somewhere that you can use but there's also a tool called binder which lets you run these things internally as well which basically spins up loaded rocket containers according to your requirements file and then you can have that on your own hardware or if you have if there's no reason to hide what you or no reason to not share it you can use it on public infrastructure as well there's also this massive European science cloud where they've just like provide kernels for you to plug into so yeah there's lot there's lots of options I don't I can have some links to this talk afterwards actually or file an issue and I'll add some more like that thank you yes sir hi we are all out of time and we have a great speaker next but I'm sure we can find you around awesome yay thank you everyone [Applause]
Info
Channel: DjangoCon Europe
Views: 3,203
Rating: undefined out of 5
Keywords: django, jupyter, altair
Id: x6qxpm_SSZ8
Channel Id: undefined
Length: 27min 40sec (1660 seconds)
Published: Sun Apr 21 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.