Build a dashboard in under 30 minutes with Streamlit!

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
today i'm talking to tyler we're going to be building a streamlin app tyler is already a very advanced streamlet user and is currently authoring a book about streamlit and i figured today we would use the time to actually talk about a business problem on interview query and actually build a streamline app for that business case an actual business case that i'm trying to solve beforehand anyway awesome yeah i have to go first yeah hey hey i'm tyler uh i'm a data scientist at facebook and i also have spent a lot of time thinking about how to turn my analyses into things that other people can actually interact with i feel like a lot of the analytics that data scientists do turn out to be like static documents so like a word doc or an ocean dock or some metric at the end of a machine learning model just you know expressing to someone oh this is the precision and recall of this model but in terms of the lift required for other people to interact with any data analysis that i do it's it's quite difficult so i started working with this library called streamlit which was launched a couple years ago and essentially the crux of it is it makes all of that front end work really easy so take your python script turn it into a front end that they can deploy in like a few hours or a day instead of like two weeks in django so hopefully we can like brainstorm and come up with something that might help this this business problem that jay is talking about we also the two of us for everyone's reference we haven't spent a lot of time working with this data so i think we got access to it yesterday so there's going to be a copious amount of googling and figuring out exactly how to use these tools that's the fun part and also you know back on what you said about a front end you know data scientists hate front end that's just like a core principle of what we do so i'm very excited very bullish on streamlin and i'm like super excited to have this one-on-one basically streamline coaching session happen as well this is gonna be awesome but uh yeah let's get started sure i think that the best place to start is just in a new folder so i made this new folder that i'm just calling interview query within interview query i already have like a dashboard and a like ipython notebook and i have some environment variables that have the read database password and everything okay but before we even start about let's start with that let's just go ahead and open a sublime section for all of this iq comments dash dot python file the background for the problem that you're actually having wanted to talk about it more directly is essentially that we want to figure out which commentors of interview query are kind of the highest quality commenters around and then um we want to do something kind of special and nice for them either give them um like special features or access or send them an email or talk to them or give something interesting on the back end do you have more like contacts you want to share around that yeah yeah so let's start from like a very high level actually because i want to break it down a little bit right so interview query as everyone knows is a interview prep platform you go through you solve problems people can answer the problems with writing comments uh and so over the years we've gathered a lot of really high quality comments uh kind of like a stack overflow of sorts but a paid freemium version what i have been struggling with is this problem with how do we generate more comments and basically increase the amount of quality comments uh on our site every single week right we don't really have a metric for this uh mainly because it's extremely hard to measure quality uh but at the same time we do have like you know upvotes down votes we have a lot of uh kind of signal on like replies common replies all these you know cool features on the site and i think the key thing here that we're looking for is that one thing we want to do is to start rewarding the people that are actually contributors on interview queries so basically people that are answering questions uh the questions uh and their answers and their solutions are actually correct and then be able to pluck them out and then talk to them early on and give them you know either like a reward or some sort of benefit maybe a badge maybe it could be automated after they like pass some sort of trigger but essentially just be like hey great job keep doing what you're doing plus i'd love to talk to you about how prolific you are and give you more rewards of some sort so almost like a little bit of a gamification plan uh and i think stack overflow has done a great job of this a lot of it has been like moderated uh but for us we kind of need these like moderator tools to kind of find these users yeah that makes a lot of sense so before we go ahead and jump into that let's talk about like how we would actually make a streamlit app so i have a regular python instance um hosted here um and essentially all that you need to do to actually like start your own streamline app is you just like import stream like usually people do it as um st and then whenever you write anything there's like st.title for example my new streamlit app or something and then whenever you want to actually run it you can just write streamlet run iq comments dot pi and then it's just gonna host it here and then whenever you have any sort of data or visualization or something you can just write it in the script and then um pass it to streamlight and then streamlight will have it show up here nice so okay that's kind of it's uh thankfully kind of that easy to get started with the streamlit app in itself the only thing that i'll kind of hide is we have this environment variable to actually allow me to read from your database and it's actually sitting in a config file as you can see it's sitting this in in this environment variable or environment file so i have this like mini little script here that essentially grabs this engine password from my database and then it creates this engine and then creates this connection that we can pass to pandas i notice how your uh 8501 how many streamline apps have you created now like from 708.00 no so a501 is the port so uh um it's the the export port uh that you're actually like hosting this on yeah and so if you do anything like if you um there are a bunch of different ways to deploy these apps um and one of the ways to deploy them is just take the port that you are passing your streamline app to here that a501 port and then expose that um to another like hosting service like aws or heroku or something um and you can host it that way um there's a really easy way to host a streamlit app which is like streamline has their own function which just says put it in the github repo tell us where it is and we'll host it for you and so that's like stupid easy to actually figure out it takes like 30 seconds um and especially once you figure it out it takes 30 seconds so i always use that instead of like some more complicated aws uh aws work i have to import this config file i forget exactly how this worked and then so within this actual database from what i remember we have this read i think it's read sql yeah receivable read sql query both work i think i don't know why so you have this comments um this table called comments yep and let's just try and read this as a data frame there's this um it's kind of called the swiss army knife of streamlight it's st.right and you can basically pass whatever object you want to it and it'll just figure out um how to pass it to streamlight so i'm pretty sure this will work i'm not totally sure but i guess we'll find out and then the other thing is whenever your source file changes within streamlight there's this like rerun and always rerun command i usually click always rerun create engine is not defined well we gotta import it from uh something like that spd yeah that's actually probably good for now that's pretty good and then this grab these comments it has this created id column updated at um can you talk about some of these columns real quick so we have this id that's probably the id of the comment yeah uh q q id is a question id uid is user id body is html body um and then is deleted is if the comment has been deleted or not and then content id this is just ignore that that's for like something out of scope basically okay can you expand it too yeah can you make it bigger oh okay so you can kind of do whatever with it wow that sounds great yeah um so i'm curious what is the distribution of comments per user yeah let's look at that the first thing all ubc is close your eyes yeah i know there's an easier way of doing that right can't you just do uid value counts yes i think this will work better wow there's someone that has 71 comments looks like there's quite a few like very active people with you know 25 or 30 comments and then it looks like it's just power law distributed and there's a long tail of threes two ones yeah and um i assume there's an even longer tail of zeros like this is only for all of the comments yes there's a very long tail of zeros yeah it's probably yeah i won't guess at how many people on interview query i've never commented [Laughter] um so let's add some like text here so streamlet top commentors dashboard is it let's save it so what do you think is the um the best pass for for some of this stuff i know there's another comment up vote yeah so there's a common upvotes table that is basically joined on the comment id so there's a foreign key on the comment uploads table that will then join back to the common id commons table there any way that we could find the most prolific users that comment at a more above average rate in their first month on interview query because those are probably the people that i want to pluck out and be like hey you're doing great let's keep on going there and i guess to do that we have to just like measure a standard deviation of like how many people actually comment on an interview query um but i have a feeling that if you even make one comment you'll probably be in the top 10 percent of our users because i don't think that many people do actually make comments so i was gonna say yeah potentially it could be then the commenters out of all the existing commenters out there yeah so i think the other thing is we're probably going to want to have some input in this dash for like how long we're going to look backwards okay there's like a number input function where we can essentially ask like how many days we want to filter backwards so for example we don't want to look at all the comments forever we probably want to look at all the comments for the past like 100 days or 150 days or months or like whatever comment day filter comment reaction day filter oh wow so you can just create a decimal input right there be sure that if you do step equals one then it's going to assume an integer yeah right so then now what we can do is we can basically filter in the past so these created id columns i'm going to pass them into the pandas date time and see what happens there streamlet takes advantage of the um scripting and it just assumes everything is a script everything wants to be run as a script let's use it as many times as we can awesome i have to find for each comment and comment up vote how old it is because it just has that created column i i think there's like a timestamp.now function right yep and i can probably just subtract those i probably want this in days and that'll get me the days since created i think that'll work so let's check it out well it'll actually output in blind 20 here 543 days created at first 22 7. that's the only time that things will run well the first time the rest of this i'm sure that is unlikely and then we probably want to do the same thing for the um comment upvotes it's also called a called created at right that was a column name yep okay let's see cool that makes sense and then we want to be able to filter it so i still always filter like this and i know it's probably not best practice no i still do that too some days i think that is best practice actually oh really i don't people use like the eye lock and like all that other stuff no i think i lock is way worse but i'll let other people comment on that but that's what i tell myself because i never remember how exactly to do it yeah it's weird and i'm gonna do the same thing for the comment of votes and then num days reactions nice yeah and then if this is empty which makes sense what is a good default value for these um filters here yeah let's set like 30 first month so one thing that i'm interested in is the distribution of uh votes per comment number of updates for comment on average yeah now average is going to be probably pretty deceptive here yeah you can run average it's so let's see if i remember this sure and then and then the vote function could be a one or negative one implying that the uh comment oh it can be really yeah oh so then we should do the sum yeah yeah we can download people probably a bad function honestly you usually want more positive karma in the world yeah i don't know but some comments are really bad and i want those to go lower you know i grouped by the the id of the upvote not the id of the comment which is why we saw a bunch of like ones and zeros that should do the trick yes and then this number here is the common id right yep that's on the left i'm going to turn this into an actual data frame forgot to write this out yep so now we have each common id and the number of votes that it has negative three four and then let's see if this changes 2500 days or something crazy all right yeah this seems to be working just fine so let's call this comment of votes group and then now instead of left joining this with the comment upvotes we want to left run it with the comments grouped and right on common id how left there's getting to be a lot of stuff that we're writing out here so i'm going to just delete so now we have this id created at uid body is deleted and then we have data cents created and the vote maybe we have to look at more maybe there's nothing actually wrong with this maybe there's so many comments with no upvotes i don't think that's the case it might be actually i mean these are the first few so you're looking at the last 30 so yeah if you probably sorted by yeah we can just order it that does not look right there are a lot of negative ones that there's only a small number of any content that gets any votes yeah um let's like even expand this to the past three years i feel like this is more reasonable yeah so that's our problem right now is actually i think the top comments get the highest votes and then everything else just kind of falls by the wayside i mean that's another problem right like you gotta shuffle comments right you can't just have like the best comments always at the top what i'm also wondering is no one is finding these guys so then is there any way that we can manually find these early comments effectively i care more about activity eagerness than actual uh correctness almost i would be really interested in people that wrote like five comments in their first five days versus them all being like upvoted and deemed correct okay so why don't we take this section that we have and we essentially say and instead of a title there's another one that's called subheader that's what i usually use or we can use plotly i think this is how it works i forget exactly what it is but you basically write this like fig axes these subplots and then you can actually like write graphs here i just want to get a graph of this um the votes by comment here we go okay perfect all right interesting uh yeah it's a lot of ones a lot of zeros so i'm kind of interested in these people so we're looking for the most prolific users like their first month on interview query okay let's go ahead and close out this section this has the top 10 comments or something top 100 is matplotlib like the default graph thing is that the best one that you found no if i'm using anything in production i tend to use plotly okay because it's like default interactive and it's prettier but i know matplotlib and seaborn better and so when i'm like starting and just making something i usually start with that and then and then convert it over once i figure out exactly what i would like um the other thing is that matlab libs figures just do not work super well with streamlet hosting and streamla caching because of like they try to set a whole bunch of global variables whenever you make these matplotlib figures and so it just makes it like a nightmare to work through just fyi so you that's why you'll like almost very rarely see a lot of streamlined apps that have matplotlib using them and you'll see like plowy's a really popular one or seaboard is off on that plot there i remember yeah it's built on top of matlab live so it's it's same problems so how do we grab some user data yeah we can grab the users you could probably run a join actually with the comments if you want to select from users left join comments on user's id equals dot u id or b dot u id and then a yeah b all yeah that should be good okay right that's not right let's go ahead and double check so i'm just gonna write i'm just going to write the columns out here okay yeah so we got created at yeah yeah we're getting this better and then this is the idea of the comment then created out is it created out of the comment and then updated it as updated out of comment q id uid body body text yeah cool so yeah what's this oh okay this is all the top comments thing yeah that's manually curated i got my super admin powers to find those but i think that's kind of the goal of this too is like how do we find these top comments you know yeah top comments top commenters i realized i screwed something up here so these nulls are not showing up in this histogram so let's fix that let's go back comments merge fill in a that i think works just fine i hope it does what i think it does all right there we go yeah let's see if it goes back to 100 and see what happens nice there's that the good thing is there's not a lot of like mass down voting yeah that's true that feels pretty good [Laughter] as long as the upvotes happen more than the down votes do we want to again get some input for the how long the user has been around and then also how long the comment actually happened so we basically want to grab this same exact thing that happens up here users user created at um so i actually think what we need is the the comments that are within like a certain time frame right like the first 30 days so the first thing i'm going to i'm going to do is i'm going to get the date between the comment and then you can filter it and it should probably go the other way well i guess these nulls are when this user has never commented and then if we left join we're going to see a bunch of duplicates for users right yes this is negative 293 is it ordered the other way when you did the subtraction yeah so i was right in the first in the first place the other thing that we can do nice is take the absolute of it and good hack leave it be wow cool our numbers are positive [Laughter] excellent what a win alright so the user age that comment and then now we want to basically filter for all the comments when the user age is like less than like a week or something value is seven and then the users so these are all the comments that were made in the first week of someone creating their account i want to see if there are any users that basically made a bunch of comments in their first week we actually just want to count i'm pretty sure we can count any of the columns and it'll be fine i think that probably works and then we just have to sort it yeah i don't know why i always do it this way i should always just do value pounds it makes my life so much easier you gotta add a filter for is deleted equals uh zero somewhere oh it is spammers not spammers but i think we have bugs on our website where people just like write their comment and then they'll look at it and then be like no and they'll write another one and then maybe their internet craps out and they press the button ten times oh okay you knocked it in half yeah but overall yeah so huge power distribution if you look at that and we could maybe plot it somehow actually i don't know if that would turn out looking like anything but it's just gonna be like huge power law distribution right a lot of them do actually end up doing something so could i add a filter for users that just were created at in the last 30 days as well so then i'm not getting a bunch of users that stopped using that okay so i'm going to make a user age filter comment section here and page user yeah default undo 30. yeah and then you want to grab the people that they're they were created in the last 30 days then what we need to do is actually get this stuff here yeah you could just stay sunscreen oh yeah you know what i'm talking about yep it's just us saying oh yep oh oh yeah so we are grabbing the users that were created in the last 30 days and we are ordering them by the number of comments they've made descending so user created yeah yep user created that's a user age and then we want to filter well do you guys have date picker is there date book to picker inputs as well there must be right oh i think there might be something similar i actually don't know so in the past 30 days there we go three eight nine four five baby yeah grab these four you've now been heavily rewarded for your three comments in the first seven days yeah well that's seven days is a lot of time for three comments that means they're commenting like well it also depends on the unique dates right yes like if i go in and i comment three times because i go and i kind of kind of comment yeah yeah yeah that's true i mean this clearly is just a good exercise and just showing how much more we need to promote comments i believe i think there needs to be yeah general incentive for people to kind of continue to push them through and i'd love to track the users that we actually set up to be in this user group top commenters the last something like that yeah so that'll get you the top commenters in the last 30 days it's their first month oh they do have a date picker let's just see how it actually looks so i guess i'm also very interested in like what does the uh is there like an html prettify on a lot of these things let's say that we wanted to use this as an actual dashboard for displaying the text right like let's say we wanted to display the actual comments itself and have someone read them do they have like features in which they've actually made dashboards for people to like view and interact with and like resize the tables and stuff like that yeah i think the real winner about streamlab that i enjoy is that you can take any python object and then just like put it into streamlight via like st.right so if you have an image that is like an html generation bit of text like from python regularly what you can do is you can just like generate that html in python maybe even turn into an image or turn it to text or whatever you want and then just like export it to the app and it'll just kind of show up yeah that's true because then write could then format anything you wanted it's just like yeah i'm sure pandas has some like features for prettifying everything as well yeah i've seen very nice tables written in python not exactly sure how all that worked but this is kind of the crux of how streamlit works oh one other thing that i wanted to mention that we can do is um we can actually put stuff into columns like we can separate the screen so that it's not just like one long oh yeah yep yeah i'll give an example at the bottom if you want to have two columns then we got this um with column one and we could just kind of toss whatever we wanted in there and so let's say we wanted to take this figure here put it down below and then we also wanted to take the top under comments and put this directly below this and then we want to do the same thing for the second column but instead of that we just wanted to look at the top controls in the last few days now what happens is we have two separate columns now it doesn't look pretty obviously but the uh right now the other thing you can do is you can set the um width to be wide instead of narrow yeah i'm interested in like when they start going like you can just start dragging and dropping on this because i feel like when you can start interacting the other way around too not just like a one-way interface becomes very powerful of course i don't really know the code so now this is a default wide and then for this you can see like our two sections and then you can kind of separate stuff that way that's in the crux how it works um and then i'll show everyone how to do had actually share on streamland oh they've logged me out it's nice this is actually in beta i'm not sure how much is in beta or is it output to be totally honest so i don't know what it looks like when like um someone else goes on but what you can do anytime you want to like deploy another streamline app is you can just write that there's a new app and then tell it because it's hooked up to your github already and you can just tell it where the main app actually is and so for example like i have a bunch of streamlined apps here one of the streamline apps i made is this amazon demo that has a bunch of private stuff in here this is like a private repo at the moment i'm pretty sure um and then if i like want to deploy this i can just literally take this and paste it into this github url and then it'll deploy this app for me kind of automatically wow oh the balloons wait what is what is this app let me see this one the amazon purchase habits i saw it on twitter yeah um i can show you what it actually looks like actually i'm not going to do that no the good reason let's go back to the good reading actually i think it's fine i think it's fine i don't think it shows anything okay i'm just gonna show the top two i won't show the um the bottom three because they show you more of like what your purchases actually are so as you can see this is when i was in college and got amazon and i didn't have any money and then this is when i started making money and i spent so much money on amazon wow i like that climb cove 2020. yeah 2020's covet and then i started i would purchase everything like i purchased everything from like groceries to like like just as much as i could on amazon i just i just went for it and they have these purchases over time and then i did like a really bad like just moving average just to guess how many items i would buy on amazon in the next month [Laughter] yeah this is funny these are actually seaborn graphs as you can see because i was too lazy to switch it over to something else and i debugged it enough to actually figure it out okay nice you can kind of do whatever you want with it which is the the nice part so the goodreads app oh the other thing is i i really like putting these little animations at the top of apps i think it just makes them look very cute yeah yeah and then so this will just scrape your own goodreads and this is only going to work until goodreads really deprecates their api gotcha which they said they would starting in december of last year but they haven't yet so as soon as that breaks this is gone this is just gonna be a 404 and then the the only other thing i'll mention here is that i i've used this don't tell facebook but i've used this whenever i've applied to other jobs whenever i'm like looking around or just you know if you have a job you kind of want to see what you're worth elsewhere that's usually a pretty good idea and i end up like making whenever i have a take home i end up making a streamlined app and then protecting it behind like a password or something that i passed to them and then just asking them to go and look at that so they can actually interact with it instead of just sending them like a jupiter notebook that is like exported to a pdf for some garbage streamline apps are definitely the new hot thing i think when people are doing their take home assignments and then once have you heard of other people using it i have actually try to encourage others they say it's like it's very it looks very impressive i heard of multiple data scientists on any query that use it definitely something good to learn for take homes yeah exactly it's just always gonna be better than like slides yeah that's a de fact de facto cool well this is sweet i learned a lot about streamlink today i learned a lot about my own app today after not doing analytics for a while i'm sure if we spent a couple more hours on it we could get something that like is super useful but i hope this like has kick-started some some different ideas i'm already thinking about moving my current dashboard from google sheets over to streamlit well you can always just run python in your google sheets you know it's just easy the thing about google sheets is that you can instantly go from data to visualization with no code tools and i haven't found another one that makes it look as good that fast you know no it's so quick yeah yeah it's so quick yeah i'm that one of the benefits that i i really enjoy about like yeah i love sheets whenever i have live survey data or something when i start i just like toss some basic visualization on google sheets and just use that all the time um but i think that over time things in python and this front end becomes so easy that it just the difference in effort between making a google sheet visualization making a streamline visualization is just shrinking and shrinking whereas two years ago if you would have asked me about it like or even a year ago i would have said like oh my god like a flask app or a django app is just like a week of work that is probably not even gonna work that well no um and so the effort is just so high whereas now like we whip something up and not not that long that has like user input actually filters data like whatever else and this ui isn't pretty but also it's an hour yeah i think like the best use case is like creating these sql like queries you put them into yaml files right and then basically when you commit them to github they automatically get run in like some etl every single day that updates like the streamline app or every single time you load the streamline app it reruns it i'm guessing yeah well you can every time you load this right now it's re-running the the read queries yeah yeah exactly so like every time you refresh the page right yeah yeah um anytime yeah we rerun anything there's also a little button here that's like rerun that is just gonna like rerun it all nice um but it's not gonna sit there for a while if that's what you're worried about yeah yeah i will say writing sequel in pandas like it's good for what we did which is just a very simple select join wear but after that i don't love writing my sequel there i like it just something else oh yeah you shouldn't see our back end it's basically just [Laughter] a massive sql query ported into like a data frame and then it gets exported great good good yeah yeah amazing okay well the thanks so much for for having me here yeah i'll post uh a link to the book that i've written so for background i wrote this book on on streamlit um i learned a lot as a data scientist from reading longer books on subjects i don't learn extremely well with like shorter blog posts or just from reading documentation um and so i wanted to like put something in the ecosystem that um like i would have liked when i had first uh started learning streamlining something that that i would generally use yeah so it includes a bunch of like examples and um a lot of um yeah different ways that you can that you can actually use streamlight and uh we'll put the link in the description yeah i'm pretty excited for this book i'm down to dive in and invest into streamlit uh i'd love to see the analytics on how well they're doing and i'm sure they're gonna grow bigger and better and uh tyler's book will be the de facto resource i'm sure for years oh yeah they're going to just point to point the vcs to the book and say gotta toss somebody in there now no but also what one other comment is like if you're a data scientist that like can't really afford um a book like this like just like reach out to me on twitter and i'm happy to just send you an electric copy like a pdf or something that's awesome thanks tyler thanks
Info
Channel: Data Science Jay
Views: 2,497
Rating: 4.7014923 out of 5
Keywords: streamlit, python streamlit, tyler streamlit, streamlit book, building apps for business, building dashboard, streamlit tutorial, streamlit walkthrough, streamlit python, python dashboards, matplotlib, data visualization, business visualization
Id: q3Q9mOgQgP0
Channel Id: undefined
Length: 38min 0sec (2280 seconds)
Published: Wed Sep 01 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.