How to Cache Models and Data in Streamlit (Streamlit Tutorials 01.04)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] hello and welcome back to this series on streamlit and python in the last video we made an app that looked something like this it was an nlp app that used the spacey large model a 780 megabyte model to analyze the default text that we have and you could change this if you wanted to and automatically extract based on these parameters over here in a form different entities from that text and you could select person and it would extract person however i explicitly stated that in the last video this is very problematic code in streamlit and it's problematic for one reason this is a 780 megabyte file that is having to be loaded every time we execute the script now we managed to optimize our script a little bit because if you remember in the last video we created this form which allowed for us to not rerun the script every time we changed one little thing but instead only when we ran click me however it creates a huge problem even though we're just having to run click me it's still having to reload the same model every single time the script runs this is very computationally inefficient because we're having to load a lot of information and if you notice even with this one sentence it's having to load up the model and that's why we're seeing a slow running time here so how can we improve this well we can leverage something in streamline that's known as caching we can cache the model and store it now models or really kind of anything that needs to be cached needs to be cached in the form of a function so let's walk through these steps first we need to create a function and we're going to call this load model this is going to take one argument and that's going to be the model name that's it and our function is going to look just like this it's going to be nlp is going to be equal to spacey.load model name and then it's going to return nlp basically just what we see here so let's go ahead and just work this now into our actual function we're going to say nlp is going to be equal to load model and we're just going to pass in that same argument that looks pretty nice and easy to understand right great however we notice that we don't have anything different here when we change some of these parameters it still is taking a long time to run because it's still loading in this model we haven't cached anything yet to cache it we're going to use a decorator so the at symbol st dot cash now you're probably going to have errors if you use cash as i do uh in fact you're going to see possibly one up here right now and we're lucky it nope it has fantastic we get to have a learning experience so one of the things that you'll notice about models depending on the kind of model that you're returning is you're going to have a lot of hashing problems and there's a bunch of different ways to solve this i'm going to provide a link in the description down below to the streamlit documentation here we are i'm going to pull it up over here i encourage you to spend some time with it now for this particular problem we are going to use this parameter right here allow output mutation i'm not going to get into what this is all about right now if you want to and you're curious feel free to go ahead and read all of this but i think it's a little tangential for our purposes so we're going to allow output mutation and we're going to set that equal to true now let's return to our app and if you notice we're seeing run load model that's telling me that this function is running and now you'll notice we've got the output there now pay attention quickly to or pay attention how quickly the the app runs now after we hit click me it's simply running and loading now you might not notice a huge difference but there is a difference here the model is not having to load the little delay time that you're seeing is the fact that it's the large model actually analyzing the whole text it's parsing everything it's finding the entities but the difference in time that we're seeing that's a difference of the model not having to reload each individual time instead it's being simply cached because this has not changed at all where i had to change this to the spacey media model i don't know if i have it installed in my base environment turns out i don't it would be having to reload let's go ahead and run the small model now and it's reloading because the the main parameter has actually changed if i were to change this to the large model and rerun this we would see the same thing happening it would have to rerun the load so this is one of the things that you can do with with streamlit is you can cache data now this is going to be very useful if you're working with machine learning models especially those that tend to be quite large like transformer or bert models or the spacey large model or a large ms image classifier like a detectron these are also going to be very useful if you're working with very large data sets you can load in a large data set and cache it so you don't have to reload it each individual time you're going to find that if you're working with a large quantity of texts somewhere in the range of 10 million words you don't want to have to reload all that textual data this is going to be the key to solving your problem and making life a lot happier for you while you try to develop an app become familiar with caching data while it might not appear to be that essential when you're running things on your local environment and on local host i promise you when you start trying to deploy these models in the field and get people to beta test you will be very very happy that you've learned about how to cache data and models correctly that's going to be it for this video in the next video we're going to start doing what i think is going to be the the more fun part of this series we're going to start tackling very concrete digital humanities problems we're going to solve things like how to create something more robust than this but a natural language processing app we're going to make data visualization apps we're going to explore all the different things that you can uh visualize such as plots we're going to have a network visualization app to analyze textual networks with a data set i've cultivated and we're even going to have some fun and try to reinvent voyant a very popular dh tool all in streamlit it's not going to be as good as voyant i promise but it'll be a a close facsimile i hope that's going to be it for this video though if you've liked this video series and you get a lot out of this channel please do consider supporting it via patreon and as always thank you to all my patreon supporters
Info
Channel: Python Tutorials for Digital Humanities
Views: 13,637
Rating: undefined out of 5
Keywords: python, digital humanities, python for DH, dh, python tutorial, tutorial, python and the humanities, python for the digital humanities, digital history, Python and libraries, python tutorials, python tutorials for digital humanities, streamlit tutorials, streamlit, streamlit tutorial, streamlit python, streamlit app, caching streamlit, cache data streamlit, how to cache in streamlit, streamlit and python, python streamlit, caching data in streamlit, caching models in streamlit
Id: nF-PQj0k5-o
Channel Id: undefined
Length: 6min 43sec (403 seconds)
Published: Mon Jul 26 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.