Adrien Treuille: Turn Python Scripts into Beautiful ML Tools | PyData LA 2019

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi my name is Adrian and I'm here to talk to you about a brand new open-source framework which my company launched just two months ago called streamlet and we've been in private beta for about a year working with companies like uber and stitch fix and IBM research to develop streamlet and we finally released it about two months ago and it's been a kind of a wild ride there's been like exponential growth and there's a lot of activity on Twitter that you can see about people building cool stuff in streamlet and this is actually the first time that we talked about it sort of publicly so first of all show of hands how many of you have heard of streamlet cool all right have you guys used stream button okay awesome so well nice to meet you in person and for the rest of you I'm happy to excited to show you what we're gonna do here so the outline for today's talk is or today's demo really is this so I'm gonna first tell you a little bit about sort of the space in the Python data and machine learning ecosystem that we see streamlet filling and the problem that we're trying to solve and then you guys are all gonna be able to install it if you want and play with it and then we're gonna build a cool app and then I'm gonna tell you a little bit about some of the really deep motivating examples for building streamlets use like I've already used it and so some of the deep motivating examples for using streamlet and so to sort of motivate that I'll tell you a little bit about my background so I was a professor at Carnegie Mellon University for eight years and I did a whole lot of machine learning and big data stuff and then I went to Google X and I ran a machine learning group they're working on natural language systems then I went to zukes which is a four billion dollar self-driving car company and again ran a big machine learning group there and then I saw at all of those companies the same kinds of problems sort of happening over and over again and so that's why I started streamlet with some friends in 2018 let me also say by the way that given that my roots are in academia I just strongly feel that it's way more fun if people ask questions at any times so please like raise your hand and ask a question at any time it's just it's just more fun for everyone including me any questions ok all right so what was this problem that I was seeing over and over again and that hopefully you guys might be sort of aware of too and the the basically when people typically talk about the industrial machine learning workflow at uber or something like that what they're saying is that we have some data and then we perform training on that data and we build models and then eventually we put those models in production and it's all very complicated and we have to build lots of software to support this entire ecosystem and this is definitely a very real difficult important part of the machine learning sort of you know pipeline and in fact there's a whole crop of like amazing startups in this space right now working on different aspects of this pipeline and trying to be a horizontal layer or take off different vertical pieces and so I'm actually really excited that like in a couple of years this whole process is going to look totally different and way less bespoke than it does today but notwithstanding this sort of major production pipeline the thing that I really saw as a machine learning practitioner and I saw again and again on teams is that machine learning engineers are actually app makers that is they often build a large number of bespoke apps both to demonstrate data like actually the previous session in this room sort of demonstrated some some simple panel apps to do that also to sort of provide the interstitial structure for the machine learning group in in in order to look at you know parse large datasets coordinate with other members of the team to see how models run on different data sets there's internal tooling to do that and then also to project sort of the power of the machine learning group out throughout the organization so we're working with a big company right now for example to allow machine learning engineers to directly build apps that empower a hundred thousand sales people to make real-time recommendations okay so these are the kinds of apps that we saw again and again being built by machine learning engineers and really seeing that this was actually a major pain point in the process okay why is this a pain first of all before I get started do you guys do you guys feel this have you guys been building small apps and Jupiter notebooks or perhaps with flask to to to solve problems like those I described maybe show of hands okay cool cool okay did you build it in Jupiter okay flask okay cool plotly - okay you've build it in everything our studio shiny ok cool cool so so what what typically happens in this case so I'm gonna give you the example of Zeus so we had about 80 machine learning engineers building self-driving cars this is everything from the planning system to vision pedestrian detection the entire self-driving stack basically and so an example of a tool that we would have to write would be something that would allow us to take two simulations of the self-driving car and run them at the same time and compare them to one another so typically this kind of tool would start off as like a solo programming project for a single engineer so he or she is doing some work maybe in a Jupiter notebook they are actually running the simulations lo and behold this is the typical engineering you know cycle gosh I'm doing this over and over again I should automate this more and more so maybe we copy and paste it into a script check it into github now there's a way of running this app and inevitably what happens in this case is that if the tool is important if a lot of people need to use it all of a sudden it becomes a central focus point for the group and now we need to add new features every single week and the app wasn't designed to be like well designed app that you can really add features to readily until you get into this unmaintainable 'ti trap okay so then what we would do at a certain point if a tool was really used by a lot of people on the team is we would call the tools team and this is a group of people who they were sort of an internal infrastructure team for the company and they were really sort of subject Maddock matter experts in building you know web apps and so they were they had expertise in react or view and and how you build the entire stack and they would follow the sort of standard app building flow which is to collect requirements and lay out these things and then wire them together with the various events and so on then coated up and you would get a very beautiful app at this point usually very fast and you know had all the correct features and so on and so forth which was amazing but then they'd say okay we just built your app for your team we'll get back to you in a couple months because we had were supporting ten other teams okay so what we got to was this sort of like frozen zone where the the problem was machine learning engineers either couldn't edit these apps or I should say really didn't want to or they did but it was at great great it took it was really time consuming so so that's really the observation about the state of this sort of really crucial workflow in the world and so with that background I'm now going to take you through first what is streamlet and then we'll all get to play with it a little bit okay any questions or thoughts at this point thumbs up alright cool oh yeah you have a question oh no you're just giving a thumbs up okay cool cool cool yeah all right so what is streamlet so streamlet is an app framework for machine learning engineers and data scientists specifically and the starting point for streamlet was very different from I think most app frameworks out there we basically asked ourselves what if we could make building these kinds of internal tools and web apps that machine learning engineers and data scientists build as easy as writing Python scripts so our fundamental assumption is that you are a Python programmer and that you are writing scripts in Python and that you use this sort of traditional script writing flow which is iterative execution from top to bottom of course that can be changed in a Jupiter notebook that's okay too what you're not doing necessarily is starting with a layout and then trying to figure out an event model right it's much more about sort of a dataflow transformation model in terms of how you build the apps and what we wanted to do was build an amazing app framework that really started with this Python scripting idea and so this is the app that I'm gonna show you and we're gonna we're not gonna build all of it but we'll build some of it at the very end and what this is is a it's actually a data browser for a self-driving car image data and in fact there's a little semantic search engine up on the upper left there on the right and you can select the number of sort of various semantic features and images like number of pedestrians and you can scrub through them and then you can actually run a network a neural network in this case Yolo v3 just an example in real time on this data set okay and this entire app is 300 lines of Python and in fact there are only thirty streamlet calls in the entire app so the other 270 lines of Python are actually the neural net and all of the data processing so you can imagine this is basically a data script which has been slightly annotated to make it an interactive app and that's really the goal here okay questions thoughts I'm gonna start asking myself questions what's on the next slide Adrian ok we are going to now if you're interested install and play with stream one so get out your laptops actually before we get out your laptops are you guys raise your hand if you're able to install packages with pip on your laptop ok cool everyone sweet I'm at the right conference alright so in that case get out your laptops and get a terminal and we are gonna do the following which is pip install streamlet oh by the way this is a open source project on github Apache 2 license you can look through all the source code so hopefully you'll feel comfortable with pip install streamlet I will do it myself so here's my text editor here and I'm just going to do pip install in my case upgrade you don't have to do that okay and so now you could run for example streamlet version just to see that things are not broken should be Oh dot five one who has who has OH dot five one okay very few people okay okay more go I need a way here raise your hand if you want more time okay yeah the question is realistically when can we buy autonomous cars I'll give you the same answer I would have given you five years ago in three years I'll give you another answer so this is this is unfortunately not going to satisfy your desire to buy a self-driving car for Christmas but it was an answer I actually it turns out that I was I was on the faculty at Carnegie Mellon in the computer science department robotics Institute and a friend of mine was Chris Urmson who ended up went on to lead the Google self-driving car project now called Weimer and I was like so tell me what's going on with this self-driving car thing and he said well it's really simple what the car has to do is draw a lot of boxes around things and then not hit those boxes and it really just comes down to like how many nines we have on those two tasks and each nine is getting harder and harder to get so okay who needs more time you need more time I can keep going okay so pip install streamlet okay and now a streamlet version o fi one so now here's the fun thing to do streamlet hello okay and now what's gonna happen is it's going to open a little web web server on your computer and here we see streamlet running and we have this little thing doohickey here on the left that comes out and these are a couple different demos so I'll show you some cool demos here so here's a here's an animation demo fractals okay and and down here it actually shows you the code that it took to make this demo and interestingly it's all the fractals I think there's only like one or two little streamlet clouds here so we have a little rerun button and then I'm going to get to this in a moment but you can change things about the fractals so we could do that for example and now this fractals are more separate here's another example this plotting demo so this basically just shows like a random walk I'm using Altair which is by the way I was super rad graphing library made by some good folks at Google based on Vega light based on d3 and let's see here to do mapping demo this one's fun by the way I'm just gonna rap random sweet graphing libraries doc GL by uber is so cool so here we have next you gonna zoom out a little bit so we can see everything at the same time ok so here we have we can turn on and off for different layers really fast it's pretty cool and again this is this is the code right here and this is basically most of this code is just Dec to yell ok so there's almost no stream line to do this and that's kind of I hope you get it sort of the theme that I'm getting at here which is that I'm gonna hopefully give you guys like a cool superpower that you can use in your work and in your own workflow I should also by the way mention that this is not only free and open source but it runs entirely locally on your machine so for example this is just running on my laptop which there's no like data leaking out or anything like that we do collect analytics but you can turn that off that's so that our investors give us more money but but there's no we don't we don't know anything about what you're doing and in fact if you could turn off the internet which I won't do now it says the gods at the demo will get mad at me you could turn off your internet and this would all just continue working perfectly there's no there's no cloud thing on here cool in fact you'll see that this is actually a localhost 8501 yeah why is Bato 3 a dependency remind me at BOTS or 3 days again oh yes yes yes that's a great question yeah so if you so ok if you install if you set a setting actually this is really we should probably get rid of that dependency if you set a setting in your config to mol file you can actually save your streamlet reports to to a cloud server and in fact the way that that we configure that is on an s3 bucket for a for a corporate customer actually so for example at uber we set it up so that they could save stream it reports to an internal s3 bucket that they had and boto 3 is actually a reflection of that but it actually it I could actually pick uninstall but a3 and everything should still work again I'm not going to do it now because I don't upset the demo gods but yeah oh yeah yeah okay yeah and by the way I should that's a great question I'm gonna repeat the questions too so the first question was or the last question anyway was moto3 why do we why do we use that a dependency and indeed it's because there is a functionality which is off by default that lets you upload your streamlet apps to a s3 server okay and the second question was what charting libraries do we support and actually the shout-out here is to Jupiter because we basically get to Jupiter means that there's tons of awesome Python bindings for all kinds of awesome things and we essentially can plug them into streamlet practically for free so I suggest that you go to the docs which are right here extremely Daioh slash Doc's but and there's it shows you everything but doc GL plotly matplotlib boquete holo views altair vega light i can probably think of all a tech yeah yeah yeah the question is can you handle events yeah I'm going to show you how to do interactive stuff yeah cool uh-huh yeah so the question is is this is there anything at all for more than a shiny app so so the well we see this playing a similar role in the PI cup Python ecosystem that SHINee does in the our ecosystem as you'll see there's a very different sort of programming style to streamlet so when we get into that you can you can see whether you like it or not cool okay yeah right so the question is how do you deploy a stream lit up so right now we don't offer we don't offer a solution to that you can go online and you can find like probably one or two dozen tutorials about streamlet on ec2 and AWI ec2 and Heroku and all these things and so there's actually a fair amount of information there's a growing body of information written by the community on how to deploy stream live apps in fact we are also developing a deployment solution called stream it for teams and that's going to be the enterprise version of streamlet yeah so that's how that's how we will eventually make money although we're not making money now right yeah so the question is what about is your what about GCP or other cloud providers so right now you'll find tutorials written by the community online for every major cloud provider and also things like docker containers and so on and so forth so it is a little bit of a involved process right now and so I apologize for that but if you're if you're sort of fluent in those technologies you can get it done and we encourage people to sort of you know write write more tutorials if you'd like to if you come up with a better way of doing it so we also hope to make that a really amazing experience for our users but that will come down the line yeah yeah can you back to the cluster okay I'm gonna have to keep moving after this question but thank you for asking questions yes you can well so for one thing streamlet just is pure Python so streamlet run it basically just says run this script well you'll see what streamlet run if you haven't gotten there it's it's it's it's the same thing as Python but it means run it as an app rather than as a script so in that sense the entire Python ecosystem is instantly available to streamlet so any kind of thing that you could import in your environment spark whatever it is is available to you as a streamlet app developer then there's another question about how do you deploy to clusters which is something that you can read tutorials about and also something that I can talk about after the talk but I think I should keep moving because we got a there's a lot more cool stuff I hope to show you anymore quite well no no more questions no more questions for another minute or two okay so that's streamlet hello so let's see here okay I did anyone have any trouble installing cool yeah two people yes Oh watchdog okay are you in Conda you're not in Conda what what operating system are you on okay okay okay watchdog is like the bane of our existence basically it turns out that it requires it needs to be built because it uses some operating system resources and we might just push some like upstream changes to it but it doesn't always compile properly my suggestion is to build streamlet in a virtual length or in Conda and and it should work so I apologize for that yeah there is also a way of disabling watchdog which I am forgetting at the moment so I will think about that that yeah and get back to you okay let's get moving so cool all right so now this was to your question about shining let's talk about what is the how does this work so our goal was we really wanted to make someone who is comfortable writing Python scripts sort of immediately have the superpower of being a Python app developer how do we do that streamlet works on three simple principles if you understand those principles you understand all of stream level number one embrace Python scripting so streamlet apps are actually just scripts that run from top to bottom so if you can write everything that you can do in a Python script factor things into functions write it in your favorite IDE whatever it all works perfectly in streamlet so this is a little hello world will write this in a second and there it is executing as an app okay number two create widgets as variables so here in my script you see that I've inserted this streamlet function called st dot slider and anytime you have a variable anywhere in your scripts that you'd like to make interactive you can just substitute SG slider or any kind of widget that you want and it sort of automatically assembles the UI so that that's now a something that can be changed interactively okay and finally reuse data and computation so I'm gonna talk about this thing called St cache but what that allows us to do is actually play this trick of rerunning script over and over again without it being insanely slow by caching computation okay so that's the basic of streamlet and so now to actually play around with this who got streamlet installed on their computer raise your hand oh sweet yes high numbers okay I'm gonna ask you guys to go to a github repo and just copy a little bit of code so go to github.com slash stream button I'll do it myself good over here there it is github.com slash streamlet here we are probably way too small for you to see and then we're gonna go to demo uber NYC pickups really the only reason why we're doing this part is because there's like a little snippet of code to download the data that's super annoying to write and I would not want to describe it to you like character by character so we're gonna go here here is yours here we are and then we're gonna click on app dot PI and now what I'm gonna do is gonna zoom in a little bit let's grab everything from line 47 up oops we don't need the whole license we can start an import streamlet SST I'm gonna move this what's up 47 oh sorry 18 247 yeah really really we just want 38 to 44 to be honest that's the kind of gnarly part but you just grabbed this whole thing no we also need to get the bucket mm-hmm okay so I'm gonna copy this and I'm gonna go over to my text editor here oops I already pasted it in and here we have lines 1 through now that now it's 1 through 29 so raise your hand if you are at this point okay cool we had like a 90 percent drop in so the point is I have these first 29 lines in a text editor basically like this is in vs code I have yes code on the left I know so we have a different model than the reactive computation that you'll see in Chinese so you'll see it's not very complicated and we're about to get to it actually so yeah so yeah the question was what's the what's the relationship between caching and reactive computation there are some deep connections but we won't go to too much into that now but I think you'll see when by demonstration that how it works okay are you guys all here so raise your hand if you need a little more time okay so what I'm gonna do now is I'm this is this guy I'm calling it hello dot PI you could call it anything you want I'm gonna here we are on my my little text editor here I'm gonna say streamlet run hello dot PI okay so what's going on here so a couple of things first of all we have this little title thing at the top so these are these you know there's actually very few function calls instrument so it's pretty easy to learn but you know we have s T title or another way that we could do that it was we could just say s to you right oh and now when I save my file this is a little trick that we're just borrowing from like web development it's actually going to scan the file and it's going to say we saw a change in your file do you want to update what you're seeing on the right so I'm gonna say yes I do want to do that always rerun okay and now you see that uber pickups in New York City got smaller and this is actually marked down so we can just make it big again or we can make it a little bit smaller oops the other thing that we could do by the way if you're in Python 3 is you can anytime streamlet sees a Python object which is not a function call it it wraps it in st dot right so we could also just go like this which okay sell it's kinda nice we could also get rid of this markdown yes okay I'm just showing some random stuff you don't need to follow along with this I'll tell you in to follow along again okay cool and we could add some wise okay cool I'm actually gonna delete this because who cares so now here's where things are interesting we're gonna load a little bit of data from from the web and then we're gonna do some non-trivial computation like well this is not like mind-blowing computation that we'll get at you a PhD but it actually takes a little bit of time which is what I mean by non-trivial so first of all we're gonna load a hundred thousand lines of this data these are Buber pickups in New York City it's the fun little data set you can get online and we are going to rename the columns and we are going to convert one of the columns as a string to a date time and believe it or not well you probably you guys know this it's like to you a very time-consuming thing to do it's like a high entropy code path to turn a string into a date time so okay and now we can take a look at the data so I'm just gonna say data and I'm gonna tell a streamlet to display it okay and and here we're looking at these pickups so what do we have here these are this is the date time of the pickup and then this is the latitude and this is the longitude and this is something called a base that we have no idea what that is but it's always the same thing and typically and a script like this would actually be slow to rerun cuz we have to go to the web download all this data do all these transformations so why is it not slow in this case it's because of this s/t cache annotation actually we don't need this either this you can keep it if you want I'm going to delete it so here it is running and then what this annotation does is basically say store this data and then every time I see this function call again if I already know the answer just substitute back in so four major computer science nerds this is just memorization but it in this case it means that we can now play with this data set like basically interactively even in the context of this script so for example I can add we can look at only the uber pickups at noon so I'll say hour equals noon and then we'll do a little filter date/time DT okay and now we're looking at only the uber pickups at noon and if I change this to 11 this is sort of the beginning of the streamlet magic it updates instantly okay is anyone following along at this point in terms of actually coding yes there's a question right there okay you're you're seeing a blank page my guess try okay this is what I would do I'm gonna just get rid of everything here except for this dream let pretend it's not there and we're just gonna say SG dot right hello world oops oh what's going on here oh here we go I would just try these two lines of code and and if that doesn't work then then we'll have to think more deeply yes yeah oh sorry yeah so I I was speaking very quickly so the question is like what's this thing about wrapping stuff in st dot right first of all you can just not do that and it just uses st dot right ordinarily and it'll everything will work I there's actually a slightly more complicated description of what gets automatically st written it's a sort of naked object sitting in your buffer that's not a function call it's not a comment it's not a if statement or a while statement or other kind of syntactic construct so you know like for example I could say true and it'll say true but if I said if true it won't say anything because if true doesn't fall into the special category of things but if I say if true 42 then it'll do that last question for now yeah okay so I okay so the question is it's really about what is this st cash thing doing and what happens I think what you're asking is what happens if you mutate the object that comes out of st cash is that the question huh yeah yes there it should it's a it's an important constraint that always the same number the same thing will happen every time your script runs if you don't change anything so I I would be surprised if that weren't the case but but but that's certainly the mental model and I could I could look at your code and see what's going on but just give you in a sense here if I say data and then Len data so I have thirty three thousand nine hundred if I change it to 11:00 a.m. I have three thousand nine seventy seven if I change it back it goes back so that should always be the same okay is your question answered okay call we get yes good your hello world worked yeah okay so I I think it's gonna be impossible for everyone to follow along perfectly with the code the whole time but if you can please do and if not I think the you can gestalt what's going on any other questions cool all right so we're where we are now is that we have this ability to quickly filter through this data just by changing one variable here so there's 10:00 a.m. 11:00 a.m. okay let's do something more fun which is actually map the data so let's just let's get this a raw data okay you don't have to do exactly what I'm doing here and do it this okay so this is the raw data at 11:00 a.m. now let's actually map it so we'll put something here and we'll say geo data and now I'm going to use a new visualization function which is st dot map okay now we're actually looking at where these pickups are happening right and this is by the way using deck GL and so this is pretty cool so now we're going to get to how would you play with interaction and so the first thing to note is that basically all I've shown you right now is how to write scripts that output something to a web browser like super quickly and hopefully not without too much difficulty but this is not what you would call an app yet okay but on the other hand it is very much in the style of an ordinary Python script we load some data we transform it and so on okay so now we're going to make it into an actual app so as I mentioned let's let's resubstitute this variable for a for a widget so it's called this hour and here we go min value max value okay so 0:24 which I think it's 23 there we go okay I'm gonna zoom out a little bit so you can see this better okay and now you'll see that as I change the snoopin a little bit here as I change this everything updates right and in fact you'll like for example it's not like rebuilding the web page from scratch every time in fact it's very intelligently diffing what was in the web page with respect to the data that your app has and then sending only the deltas that are required to make that possible that's one of the reasons why it's so fast and and so ok had I not known how st slider works just I'm just randomly riffing here we could actually just take a look at st slider and here is the documentation for it just a fun little trick another fun little trick is like well you know this kind of annoying how this thing is is right there maybe I can move it to the left so instead of saying SG slider there's another area you have access to which is called the sidebar st da sidebar that slider and now we've got a nice little sidebar there and zoom out cool right and now this map it's a little boring it so happens that Dec GL is like super amazing and like insanely customizable rather than actually type in all kinds of crazy deck GL I'm just going to copy it from github because it looks cooler that way we're just gonna replace that one line with about maybe 15 lines of code okay and now we have a really cool 3d visualization of all these different things right and and by the way you know if we didn't like a slider we could make this a can we what could we make this oh I think we have something called an enum input now this is actually cool No well you guys a slider for now I can't remember what it's called if someone if someone's yeah good what's that what's that oh I typed number oh cool oh there we go cool so now we have the same thing but with a different kind of widget for example and then here to like you know maybe you know we we don't we don't always want to look at these this is more for like a debugging thing so I'm going to show you another cool thing we could do we could say we're going to use a different type of construct if STD check box show raw data okay so now the data is kind of hidden back there see that so notice that we're not writing any callbacks we're not doing any kind of declaratively we're really just instrumenting a script with these esti calls and what they're doing is they're allowing us to control the control flow at the same time as insert UI elements which produce this script that you can play with and and share with others in case anyone's wondering whether you can run streamlet in a Jupiter hub instance the answer many people have asked this and the answer has been no until about five minutes ago when mathematical Michael comm just demonstrated that it is possible and I have no idea what kind of magic was involved in doing that but that's super awesome so so maybe hopefully you'll share that with the community cool alright good hopefully everyone's feeling a little stretched out so let's build a data app okay so in this final segment I wanted to share with you a little bit about sort of a more advanced use case for streamlet so the even though the fundamental building blocks are themselves extremely easy hopefully if not you should let us know because we're constantly trying to make them easier I'm not kidding but even though they're hopefully not you know super sophisticated there's not a huge amount to learn about streamlet you can actually build really non-trivial apps quite quickly and so I'm gonna tell you a little bit about one of the really driving use cases behind the creation of stream lab which was my experience at zooks working on self-driving cars and so what we did was all of the I was an engineering manager there and all of the engineering managers would meet once a week and we would basically look at instances where the car hadn't behaved as we expected and so this was the operator of the self-driving car but it pushed a button or whatever that figuratively speaking that caused the car to disengage okay and then that was an issue that had to be resolved that week basically and so in a sense this workflow was similar to the software engineering workflow where you have issues coming in and then you need to go into github or whatever JIRA and then you they get assigned to people and then they get fixed but in this amorphous crazy world of semi-intelligent vehicles and machine learning it wasn't that simple right you couldn't root cause a neural network right so what was the actual flavor of like debugging a self-driving car so what it really was was the engineering leaders sitting around a table and we would sort of open up this is this is way moe just copied off the web by the way but same idea we would open up an instance it's a little hard to see there but in this case what we have is it looks like a someone in a wheelchair following a dog around on the street and this is the kind of thing and this is actually quite an exotic example but this is the kind of thing that happens all the time you're debunking self-driving cars is that like every single week no matter how many years you've been working at this some crazy thing happens that you've never seen before and you're like how is the car gonna behave in this case so that's perhaps another answer to the question why this is taking so long there's tons of weird examples about this maps are like that too by the way every time you think you've made a map that's like centimeter accurate some crazy thing happens in the world and your your map is broken ok so what we'd do is we'd say let's figure out at this point this is a very complicated to see this is not just the people in the room there's like project managers and project managers and stuff there's this entire sort of ecosystem working on this problem people are trying to say at this point in time can we understand what all the sensors were doing what we must state all the neural nets were et cetera et cetera the planner and then can we get it in some sense to an engineer or an engineering team who can sort of break it down further and try to like quote-unquote reproduce this case but because there's no such thing as reproduction as such the I mean you could you can replicate the state of everything on on the vehicle but that that's only one instance in time what you're trying to get as a more general understanding of what broke down okay what you really want to do is you actually want to create essentially searches over your data set for similar instances in time and then what you want to be able to do is essentially regress those searches over your data sets against different versions of the software running on the car so yeah the picture here is that we have this data which is vast it's you know terabytes or exabytes god-knows-what of data and we want to find subsets of it that are similar to this error case and then on in the other column we have quote unquote column or let's say dimension we have all the models I can quote unquote models because really there's like many many models running simultaneously on the vehicle but again we're gonna sort of sidestep that and just think theoretically so we have data along one dimension we have models along another direction and we want to be but a quickly subset some of these and then look at the intersection how would the models run on this data that's sort of the beginning of the process of debugging some kind of crazy thing that's happening on a self-driving car and and by the way this is you know when I was at Google acts I wasn't working on the self-driving car team but I was close to them and I'd this is the exact same process in every project that's working on self-driving cars no doubt and so what happened was the engineering teams eventually through this complicated slow process built internal tooling that made this better and better and week after week more features were added until we actually had these really beautiful tools that allowed us to interrogate our own data and then rod run models against it and this kind of internal tooling is being replicated not only in L of every self-driving car project but also really in every machine learning project on some level and especially once you get to a certain size team so this is an example of sort of the hidden bespoke internal tooling layer that floats beneath the the amazing mathematics and and futuristic technologies of machine learning and so we built an example of this tool in streamlet now it's a very small simplified example but it gives you some idea of how of how this works and so I will invite you to go check out this out on github I'm not going to invite you to run it because it's going to download a three gigabyte neural network maybe maybe it's five hundred megabytes I can remember uh well I'll tell you right now bring up my bring out my thing will close the streamlet browser oops why did I interrupt him oh just control see you can't interrupt yours okay well I'll help you with that afterwards yeah you by the way you can also open another terminal and kill it if if all else fails okay okay there's 200 megabytes point is if I tell you guys to do this right now all the same time you're all going to simultaneously download children events and that would probably just kill the Wi-Fi and make it horrible for everyone so I don't take responsibility for the PI data la Wi-Fi system so I already have it downloaded and then afterwards you should feel free to do what I'm about to do try it at home or pride at your hotel room so here we're gonna go to let's go back to the streamlet github by the way this is streamlet itself and as i mentioned it is an open source project we're actually getting a whole bunch of contributions from the community not a huge number we've been in public for about two months but I think we've gotten like 15 or so pull requests which is like super exciting and then there's also lots of other stuff on the forums and Twitter and all that stuff as you can check out but and you know Apache too and you can you can play with all this stuff and look through it more importantly but what we're going to do is we're gonna go to this demo self-driving thing and here is another it's just another little example of streamlet and I could download this file but we have a cool little feature which is that you could just run a Python script directly off any URL so really all this it is running locally on my computer just downloading it first and then it's running that so it's just a little time saver so we're gonna run this baby boom okay alright and so here we are in the app and I'm gonna go down here to show the source code honestly the point is that this is the entire source code it's 300 lines and if you look carefully okay let's actually go into this source code and zoom in you'll see that like literally this is everything that you're seeing including actually running the neural net somewhere around here layer by layer okay the whole thing is happening and we're not like hiding things in other libraries or something so this is the source code again it's 300 lines of Python only 30 streamlet calls okay and so there are some extremely calls right there okay and so let's let's run this and see what happens so here we go to run the app okay just doing some SD cache stuff load metadata by the way is building this little search engine over the data set this is the Udacity self-driving car data set so it's just a nice data set of images with boxes around them basically and so here we are here we can start to say let's say that the situation was I you know we we ran into a situation where there was there were a large number of traffic lights let's say so let's go to here and we'll say traffic lights and then we'll go for a large number of traffic lights and then here we just only six images in the data set with over fourteen traffic lights of course who's ever heard of over fourteen traffic lights but here we can scrub through it let's go for pedestrians you know what actually let's make this like way bigger because it's more fun we can get into more normal levels okay so so there we got like a lot of pedestrians now and here we can scrub through it and notice all of this is written without callbacks and the same exact sort of data flow style that I described to you earlier it's just a script that's run from top to bottom but it gives you this really convincing illusion of an app you might say and then notably down here it's actually running a neural net in real time so and just to prove that here we can we can we can change parameters of the neural net itself so you'll see that as I increase the confidence threshold more and more people get classified as people and then as I decrease it you know fewer and fewer and then similarly there's this thing called the overlap threshold so these are actually these are actually parameters of the net itself so this net is pre-trained at which it's not parameters of the net its parameters of the post-processing that happens after the neural net is run so all these objects are detected using Yolo which is a super-awesome object detection by the way and and then you do so little post-processing to figure out what's the overlap threshold and stuff of boxes and and these are the kinds of like you know I don't need to tell you of millions and millions of tiny little parameters there and inside the neural net system and which are extremely difficult to debug and really gain intuition about at all and so you can imagine that we would have not only this one model but perhaps thousands of versioned models that we could then run against this data set and so this is an example of an app that like on the one hand it's not very complicated it seems almost like formulaic perhaps just grab some data grab a model run one against the other but on the other hand there's a lot of very minor bespoke things about it and so the ability of for a machine learning engineer or data scientist to build this him or herself directly open it up and fiddle with it and then make it available to other people inside the organization to their co-workers to their executives to their interns and batad potentially far beyond the machine learning group was really what we were trying to go for when we created streamlet and hopefully it's something that that you guys are also resonating with as both part of your workflow and also a skill that that that could be useful to you so quickly yeah yeah yeah so so there's basically the the it was more of a statement than a question which is that this would be really useful to sort of serve a model not in the sense of let's say an API endpoint which is not the point but actually as a data app as it were with inference yeah and that is that is very much one of the use cases that we had in mind so that so that's exactly right I would also encourage you guys to go to your interested go to Twitter and type in the streamlet you'll find that people have put up like thousands of different apps you can also search on github for streamlet and there's a lot of interesting things that people are doing now so there's there's those kinds of like deploy a wrap and inference of some kind there's people who are just demonstrating really cool like demonstrating their github repos we're seeing that more and more like oh if you want to see how this thing works just pip install streamlet and you can use you know blah-dee-blah and then there's annotation tasks there is just a whole bunch of weird science stuff going on in streamlet which is really delightful and it makes it super fun to go to work okay another question huh yes yes yes okay so the question was could we could we just render this down to an HTML document remove the back end yes so that's a great question yes and that's what the botto library is for so you can if you go to streamlet dot io / docs we describe so this has only been out for two months and we're going to make it cooler and cooler and cooler right now there's some things derp a little bit you have to just do it yourself in this case if you go to the config file you can configure it so that you can save HTML files out and you can save it to s3 buckets and that's what Bato is for so yes so you cannot save it outside of s3 buckets now because that's the only use case that we coded that being said it's an open source project and that's a very great use case and it's not a big delta from what we have now it's it's in fact actually that part of the whole program is a plug-in system so you could have like the file system save or you could have a DCP save we just only wrote one plugin for it so far so there's a lot of things like that in streamlet where it's like we have really big plans and we'd love to engage the community and helping us like build out all these things and also gain ideas for how to build things I instantly mention that almost every good idea in streamlet was just came up with people who were using it especially during this year-long beta period so there's a really long private beta period where we just got tons of ideas from the community so now that things are growing we're trying to sort of scale up that same process but - too many more users okay a question in the back yeah yeah so the question is is it expensive to run inference with neural nets and so the answer is yes the I let's see we actually have a really cool game demo that you should check out or you can like more faces and stuff and that's a really fun streamlet application and we haven't put the source code up just as we haven't had time to but that I think will probably do it like the next month or so so you can play with Gans and streamlet and it's super fun number two training neural nets is much more expensive than running inference on them so actually deploying an app which runs inference like at the speed of someone clicking is like orders and orders of magnitude less computationally expensive than actually trading a neural net at the speed of a GPU and so in that sense it probably has something to be concerned about in most applications I will mention that people have been starting to train neural nets in streamlet and using treatment as a front-end to that process sort of as like a bit of a almost like a tensor board alternative which is not sort of main use case for us but another cool idea yeah that's right that's right so the key thing here is that we do not nothing here is running on any cloud server as in the demo that I just showed you it's just running localhost 8501 so the only cloud server that's being connected is one which is actually measuring a very little anonymized information like was an app created or was it an app basically interacted with and we use that because that helps us understand it and also demonstrate to our investors that people are interested in using streamlet so we you can turn it off but we would prefer that you leave it on yes is it possible to upload data oh do you mean you want to upload data in the UI yes that is a feature that should be coming out next week which is the file uploader widget and in fact if you want a proof of this go to github and search the issues for file uploader widget and I think it's all been merged in so it should be coming out node 5 - yes yeah so so the Enterprise version is called stream it for teams and there are two versions and we're not sure which one we are going to come out we're developing it right now or not sure which one we're gonna come out with yet the first one is an on-prem version that's something that we have running like on our servers in in our organization and then there's also a cloud version for for like smaller typically smaller organizations who don't like have a huge on-prem installation so that's something that we're developing right now it's we're doing it with a small number of partners who are existing extremely users but you can actually go online to streamlet thought is slashed for teams and type in your email address if you wanna get added to the waiting list and people will contact you and say how big is your organization and stuff and so we're certainly looking for it to gain insight into how people might use the deployed version that that's that's not available yet so it's coming on yeah mm-hmm that's a very okay that's a very sophisticated and fascinating question which is how could what's the next part of the of the workflow for a machine learning engineer who's doing debugging in the sort of self-driving car use case and is there a way of maybe maybe plugging it into a regression suite or something so that's not a problem that we don't consider that a part of the streamlet problem definition as such we've used streamlet as an app development framework for others to canvas for others to create apps in we chose Python as the launch language for sort of infinite reasons but one of them is that there is a very large ecosystem of plug-ins for you know basically every other SAS service out there so the limits really your imagination and and certainly if you come up with cool integrations with other services feel free to put them on the web too as separate open-source projects or you can do pull requests against streamlet and and the ecosystem can hopefully get richer with that way okay we're just at the end of the talk here so oh we are at the end of the talk here so thank you so much for your time and attention and I'm available to chat with people afterwards but thank you so much [Applause]
Info
Channel: PyData
Views: 7,058
Rating: 4.981132 out of 5
Keywords: Python, Tutorial, Education, NumFOCUS, PyData, Opensource, download, learn, syntax, software, python 3
Id: 0It8phQ1gkQ
Channel Id: undefined
Length: 68min 49sec (4129 seconds)
Published: Sun Dec 29 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.