Explore Your Data and Then Let Others Do It Too: Plotly Express and Dash - Nicolas Kruchten

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
can everybody hear me if I talk like this louder okay I get it's easier with them without the mic okay so thanks for having me my name is nicolas christian i'm vp of product at plotly and i'm here to talk to you about it's a bit of an ambitious talk I only have twenty minutes but I'm gonna try and talk about two great open-source libraries plotly Express and - so plotly Express is a library we've just launched we launched it ten days ago I'm very excited about it I'm the the lead author for it and it's a library for data visualization with a special focus on data exploration through rapid iteration so you've got the website right there turns out partly that Express is a valid domain so we got it and if you just want to play with it it's it's a pip install plotly Express and the other open source library I'm going to talk to you about it's called - and - is a framework for creating analytical web applications in pure Python so the tag line for - and I'll sort of show you why that why that is is no JavaScript required and again you've got the website for - right there and just a quick note both of these libraries or MS MIT license they're totally free for any purpose poly has a couple of commercial products but I won't talk about them tonight so everything I'm showing you today you can do yourself at home or on your laptop right now if you like so quick show of hands though before I get started who here uses Python mostly for data science mostly for web development all right and if you had to make a chart right now in Python you would use matplotlib raw matplotlib Seabourn bouquet plotly okay coop values and the web developers knowing you guys using Django or flask okay so it's not my intention necessarily to replace either of those libraries as part of your daily tool kit but but certainly I'm hoping to entice you a little bit in both of these so but most of my talk will just basically be some live demos hopefully so partly Express is most useful from a Jupiter notebook and it's intended to be as easy and simple to use as possible so I'm gonna kind of create a bit of a role-playing scenario here let's pretend that I'm a data scientist it's 9:00 a.m. I've gotten a new a new data set that I don't know much about I'm gonna fire up plotly Express and see if I can make any sense of it so plow the Express is pretty easy you just import it and the data set I'm going to use is the tips data set it's actually built into plotly Express for demo purposes and it's pretty simple it's pretty small children 44 rows each row represent a restaurant bill you've got some columns there you've got some categorical columns some numerical columns so basically the total tip so total total bill the tip sex is the sex of the payer smoker is whether there were any smokers in the party the entire more obvious and then the size is the size of the party so here's the data set it's kind of interesting and you know I'm sort of working data scientist I'm gonna explore so you see what's going on with with Polly Express so let's make a basic scatter plot the X dot scatter I'll give it my data and let's take a look at total bill versus tips so one lines pretty much pretty much what do you expect you know yeah that's what it says on the box scatter x equals total bill y equals tip a couple of couple of things to note this is a fully interactive plot so you have you have hovers you can say you can pan you can zoom and it's set up you know the axes for you which is kind of nice but let's see let's see if we can dig into this this data side a little bit it's kind of hard to see what the distribution is here let's do some margin rules here so marginal x equals box marginal y equals violin okay so now I've got sort of a marginal plot here of I'll in plot a box plot with a No much some outliers kind of interesting some of the other columns I've got here sex let's take a look at that so color equals x all right starting to split the data for me so the basic principles are plotly Express are just that you give it a data frame and then whatever visual variable you care about like the x position the Y position or the color you just tell it the name of the data frame sorry the name of the column and it sort of handles the rest for you it signs colors it creates the legend and everything's nice and interactive everything's cross-linked here so I can pan it pans i double-click to reset I click to hide a series I click to show a series everything sort of works out as you'd expect and I can keep going let's say I'm interested in in seeing if there's any kind of trend in these here so yeah okay women in men when they pay they tip at approximately the same rate these trend lines are roughly the same slope you can mouse over here it shows you the trend line it shows you the equation it computes the r-squared for you the kind of basic stuff you'd expect from from data to data visualization library but there's some other columns in this data set so let's let's take a look so instead of these marginals here let's let's look at some other variables like whether or not there were any smokers so now I've I faceted my data so whether your data is on one or more subplots from the point of view of plot Lee Express is just like whether your data is blue or red it's just another visual variable and you shouldn't do a big song and dance about declaring subplots and knowing how many you have and creating all the titles and stuff while the Express just kind of does that for you and so here you can see non-smokers on the on the right I'll start on the Left smokers on the right a slightly different slope on the trend lines there mmm r-squared isn't very good probably not a very strong relationship but maybe maybe worth digging into alright but I see that there's some some other columns here in the data set like de let's take a look maybe I can pass it again by day yeah a little bit messy maybe instead of coloring by vice maybe I can not color or not add some trend lines here okay facet by let's do day this way time this way Oh interesting so first of all a couple things to note here I've got my days kind of out of order this is pretty common with data visualization libraries it's kind of a pain so Polly Express has some built-in built-in functionality for avoiding this problem you can basically tell it day look this is the order I want alright this is the order I want it in and then lunch and dinner you can see that actually interestingly enough only one person in this dataset had dinner on Thursday maybe we should close the restaurant on Thursday afternoons it would it would save us also money so you can see that it's fairly easy to kind of very rapidly slice and dice your data just by one function call in plotly Express now we do more than just scatter plots let's take a look at you know now that I'm looking at my data by by day I'm interested in how much money am I actually making well let's take a look here I can do a bar chart x equals day y equals total bill all right are the orders out of whack again so I could just probably just copy the same the same thing here so you notice that the arguments to bar and scatter are basically the same kind of kind of nice and it's it's everything every row in my data set here is a little rectangle just like every row and my data said in the previous plot was was a color so Polly Express comes with a whole bunch of different plotting functions out of the box you know it's not a competition but who is the most chart types but you've got bars you've got scatters I can do maps so I have a couple examples of that sort of thing for you a little bit later line charts some interesting multi-dimensional multi-dimensional types like scatter matrix is actually really cool you can basically see every variable plotted against every other variable and when you do a selection they kind of like those across linked selection across them and you can sort of pan and zoom one function call ladies and gentlemen scatter matrix so this is partly Express I don't really have time to go into a full detailed tutorial but actually kind of give you a taste of the kind of data exploration you can do just with one library a couple of simple function calls in a jupiter notebook so that's my morning as a data scientist but in the afternoon i need to communicate my results with other people in my company this very thriving restaurant business with my 244 meals per week you know i need to communicate these findings with with the other waiters on staff the problem is they don't use jupiter notebooks and they don't want to type python code I'm the only I'm the only Python nerd in the group so I would be stuck except that there is Platt Lee's - library so - is a framework that allows you to very easily build web applications in pure Python I don't know JavaScript I'm just enough of a nerd to know Python that's about it but I do I do want to share my my cool charts with my colleagues so how am I going to do well I'm gonna build a - app and actually I'm in a live code a - app for you so this is a basic - app you just import the - package and some HTML components you create an app and then you give it what's called a layout so here if anybody here knows HTML I said no JavaScript required right - you got to know a little bit of HTML - to get a - outgoing I basically got two headers you know demo plotly Express in - and then h2 is just I'm a sub header and then I run my server so second to think about it it up pick up the tab alright hopefully that's clear enough so fairly simple this is not really an app so far but you know I can I can lay out the frame of a web server web page and and I can serve it so it wouldn't be an app unless there was some interactivity so I'll show you how you can set up some interactivity and pure Python this is a 0pi here's my first example so here things have gotten a little bit more complicated I'm reporting - core components as DCC there's some sort of interactive dropdowns and that sort of thing and some inputs and some outputs my layout has changed a little bit I've now got an input with IDX and and my my level 2 header here and I was called X out and I'm declaring a simple function which I'm decorating with app call back and I'm saying the children property of X dot out depends on the value property of X and the relationship is just X so let's see let's see what happens when I run this app I save it the app should reboot and it will be load ok so now I've got my input field here and I can type ok so what's happening under the hood here well if I look at my network tab every time I type anytime anytime I type up here you see some this is a little tricky to see ok so you see some post requests happening so basically the web browser is making a bunch of calls to my Python app which is essentially just returning X over and over and over so I'm essentially able to build a reactive application that includes some dynamic components on the front-end but without writing any JavaScript which is kind of neat ok so let's move a little bit beyond inputs and text here and and get to some charts so in the second example I'm going to load up plotly express the tips data set again and I'll grab all the the names of the columns and instead of an input oops I'm going to use a drop-down and the options will basically be all of the different columns of my data frame and then instead of an h2 I'm gonna have a graph and the figure is basically going to be a scatter plot so I'm gonna initialize it to an empty scatter plot just so it looks nice and my callback is now basically saying that the figure property of my graph is going to just be the output of a plot lay Express call let's see what happens all right so here's my here's my scatterplot here's my drop-down I can now put tip total bill as I as I move the drop-down HTTP calls are happening to my app in the background and I can see the new chart being populated okay this is not that interesting to see just the x-axis because actually the y-axis here is just kind of the order of the data so let's get on to sort of the next level here where I want to actually be able to control X Y color fascicle and facet row instead of one drop down here I'm gonna have a list of them so a whole bunch of different drop downs and then my callback is gonna have a bunch of different inputs but it's still just basically going to be the same I'm just going to map these things straight into two plotly Express help it along a little bit so here I've got you know X total bill Y it's gonna be tip color is gonna be sex fascicle is gonna be smoker and fast that row is gonna be there all right so now I've basically built a little web application that I can just host on my laptop I can upload to Heroku I can host like any other web application this is all built on flask by the way so it's just a flask application that I can host anywhere and instead of giving my findings to my colleagues as a Jupiter notebook and having them code Python I just send them a link to this application and they can kind of poke around with these with these dropdowns this isn't all you can do with - but it's kind of a neat example of how easy it is to hook up some front-end input elements to some charts in 30 lines of Python and plus a couple new lines so that's kind of exciting it's a little bit ugly though so let's let's just style it a little bit I just added a couple little little style style elements just to show that it's not all all Times New Roman so there you go it's not any slightly nicer version of this thing colored by smoker split by time and day and there you go fairly easy to share my findings you know insightful as they might be with my colleagues using - so that's kind of a lightning tour through through - and plotly express so drilling it a little bit on in terms of what we can do partly Express is basically a wrapper around plot late pie which is plot least sort of fairly mature battle-tested I did a visualization library but it kind of hides all the details for you so some of the charts I made in just one line here one function call they take 10 20 30 lines of Python or documentation to make so partly Express allows you to be significantly more efficient about sort of thinking about the graph that you want and not sort of thinking about the data structures that you need to build in order to feed to plot ly to get the graph that you want so you can stay kind of at the data and I did an analytic level of your thinking plotly Express supports all sorts of different chart types beyond beyond the ones I was able to kind of live code for you here today so 2d 3d chards turnery charts we've got maps of a few different kinds built-in animations faceting you've seen trend lines and marginals just kind of exciting and it inherits a whole bunch of cool features from it's a big brother of platypi so you can export to any format PNG SVG PDF offline HTML you can export it to the underlying json representation we have a free and open source GUI editor which you can use to edit all the different different aspects of your figures and we have built in and user-defined themes and so close your eyes if you're epileptic all of these charts are just one line with platypi so some 3d polar lines polar bars with the dark theme slippy maps lines on a map animated choropleth scatter plot matrix parallel coordinates parallel sets contours all this and more is built into two plotly Express and therefore it's accessible from - basically just directly and just like plotly Express is more than just what I've shown you - is a bit more than than what I've shown you so I basically showed you one chart five dropdowns here are a few more complicated examples this is one that we've made which I've actually read II got loaded so this is basically exploring support vector machines with a bunch of different parameters and it shows you know you've got kind of a funky chart here that shows the output of a machine learning model on some random data you can change the data set you can change the sample size you can play with the threshold of your machine learning model here you can see inputs that depend on each other so radial basis function has no degree but a polynomial does so you can see that the inputs can kind of depend on each other everything refreshes fairly fast this things up in the cloud so even though it's making sort of big HTTP calls every time you change one of the inputs to get the graph back still very very quick and very very easy to build an app like this so the the source for this is up on github you have to scroll a little bit I don't know why but if you scroll here you've got the link for the this app so this is one that we've made but what we're most excited about it plotly is that people are actually using - sort of out in the wild as any sort of successful open source project and so what we've seen is that there's a couple of papers that have come out in nature we're sort of actual hardcore scientists are using - to present the results to each other which is hugely exciting for us so this is an example of the website to support a paper entirely built in - so this isn't just a little app right this is the entire site is in - you've got sort of a tab navigation up here a user guide we were super impressed when when this when this app came out this person really just grabbed - and sort of used it for all it was worth to present their results I don't really know about I really know a whole lot about about gene editing but you know I know I know a sexy data app when I see one so this person has has gone has gone sort of all in on on - and is using this to present their results which is really exciting for us and there's another app that came sorry another paper that came out recently in nature talking about the cost of electricity in sub-saharan Africa and here they're using again a lot of features of both plot lead up plot Lina PI this is pre plot Lee Express and - to build some some fairly sophisticated models these one's a little less reactive because I think they actually need to execute some complicated code on the background so they have like an update button so it's a little bit less reactive but just goes to show some of the sort of flexibility and power of of - as a sort of an application framework and of plotly dot pi as a visualization framework to back it and so we're very excited to sort of give these two libraries away to the community to see what kind of apps people can build out of - to see what kind of charts people can build and I plot Lee Express and we work very hard to make sure that they sort of clicked together very nicely so that one lets you use the other as smoothly as possible I have time for a couple of questions if people have any questions yep it really depends on the details of what you're doing certainly at some point you will run into latency issues there are a couple of different ways to sorry the question was for those who can't hear at what point do you write into latency issues when you have large data sets it depends a little bit on whether this is an app that sort of hosted on the on the public internet or whether it's hosted inside your network but it's definitely one of the engineering considerations you need to take into account when building a - app overall though I have personally been surprised at how not much of a problem it is it sort of turns out to work out pretty well most of the time and there are a couple of different ways that are coming out I think this week to move some of the processing to the client side by writing a little bit of JavaScript if you need to so they're sort of the client-side escape hatch for for faster processing but the general recommendation that we have is sort of build it the naive way you might be surprised if it turns out to be a real problem there are there are ways of optimizing things selectors that comes directly from that but let's say I select my CSS in absolutely the way that each of these is a component so - core components is a component library and we have a sort of fairly mechanical way of wrapping react components connecting them into the - framework and making them available and so there are a number of third-party components that have been built we have a couple of libraries we have - bio and - DAC which are a couple of different widget toolkits that you can use for different applications so if you have a particular chart type that you like or a particular calendar picker or very complex control in JavaScript in order to make access make it accessible - - you have to first wrap it and react and then and then - and that's if you go to - top plot or ly there's a component builder's guide that kind of walks you through the details of that the great thing is once you've done that if you open-source it it's available to all - users in the community and so we're always very excited when people contribute sort of new components to the - the - community and make them available to each other ok well I'll be around during the break if anybody has any questions that they don't feel comfortable shouting out thank you for listening [Applause]
Info
Channel: Montreal-Python
Views: 18,543
Rating: undefined out of 5
Keywords: montreal, python, technology, dev, web, science, code, conference, user group, plotly
Id: 5Cw4JumJTwo
Channel Id: undefined
Length: 23min 7sec (1387 seconds)
Published: Wed May 01 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.