End to end tutorial to Build and Deploy a Streamlit Application on Heroku

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
👍︎︎ 1 👤︎︎ u/ahmedbesbes 📅︎︎ Apr 07 2021 🗫︎ replies
Captions
hello everyone welcome to my channel my name is ahmed and i make video tutorials on how to build and scale machine learning applications to the cloud so if you're somehow interested in this content please consider subscribing so that you can get notified of the latest videos and in today's video i will show you a streamlight application i built to interactively play with machine learning models directly from the browser i will first show you how this app works how you can use it and maybe learn something from it if you're a data science practitioner then i will show you the code so that you can learn how to build this app and deploy it from scratch all the code is available as always on my github account and i will post the link in the video description so go check it afterwards if you want to test this app locally on your computer or even improve it if you want to add amazing features you can think of thank you guys for watching i hope you're all set now let's have a look so this application is called playground it's a web application i designed and it was inspired by the playground of tensorflow that allows you to tinker with machine learning models specifically deep learning model directly from the browser and to see immediate results so i built this application to interact with different kinds of model not specifically neural networks but more classical models and some non-linear problem classification problems so let's get a tour you can start by configuring a data set you can select a data set from a list of three predefined data sets i will start by the moon's data set once you select a data set you can select the number of samples the noise on the train data the noise on the test data and then you can pick a model i will start with a logistic regression as a baseline because it's a simple model and i will set the default parameter once you set the default parameters you will see some results on the left and the right side on the left side you will see the decision boundary and basically decision boundary is a 3d plot that allows you to understand how the model works and behaves regarding specific data for example in our case our model is a linear model and the decision boundary is expectedly a line that separates the data into two classes on this section we will see some performance metrics the the accuracy and f1 score on the test data in green and we will see the difference from the train data on the right side you will see the execution time of each model some links to the official documentation from scikit-learn if you want to learn more about this model its api how to use it on in practice on different problems and then you will see a snippet code snippet which is generated automatically based on the hyper parameters you set here as well as the model definition you will see on this last section some additional tips on each model some pros and cons and how you should use this model in general on different problems in our case with a linear model a logistic regression on this data set we we see that we have a pretty good performance of nearly 0.9 accuracy and f1 score which is not so bad but what we see here is that the model is not perfect in classifying these data points because this model is linear and this data set has some non-linearity so we have we can have a mismatch here we can solve this problem by two different solutions we can either perform a modification on our data by adding more features or we can increase the complexity of our model by choosing another kind of model we can start by performing some future engineering and still stick with logistic regression what we can do here is increasing the polynomial degree of our data you can do it here by adding some polynomial features and i will increase the polynomial degree in our case to three and this allows you to radically change the appearance of the decision boundary actually our models changes because it has a different input and we will see this reflected on the decision boundary in our case our model is perfectly suited to this kind of problem we didn't need to increase the complexity of a model or pick a more sophisticated one sometimes feature engineering can help once you select a polynomial degree you will see the change reflected directly on the code okay now let's go back to the degree number one with the original data and pick a more sophisticated model so i will start with i will pick a decision tree when you select the decision tree you'll immediately see a different decision boundary and this decision module is very specific to decision trees it has vertical and horizontal lines that separate the data into two classes and this reflects the behavior of decision tree which does a succession of conditionals on top of the values of features so if i want to change the model again and pick a more complex one i can do it as well by selecting a random forest and once we see what we see here is a decision tree this is boundary that resemble the decision trees however it's a bit smoother because a random forest is a bagging of different decision trees and the way a random forest classifies a data point is by an average an average mechanism that smoothes the result of an individual division tree while reducing the variance okay if you look at this selection you'll see a different kinds of models you can choose a guide in boosting knns naive bayes svms etc you can even pick a neural network let's do this you can choose the number of layers i will pick for example two layers and you'll have to set the number of neurons to each one so we will immediately see the result depicted on the decision boundary basically this application allows you to tinker with different models different hyper parameters different data set and it allows you to understand many things it will allow you to build first intuitions about the impact of each hyper parameter on the result on the execution time on the performance metrics you can also compare models regarding according performance metrics metrics uh training time you can also see the individual impact of each hyper parameter you'll understand also effects of overfitting under fitting etc basically it's a playground that allows you to diagnose the data on diagnose many models on different types of data it's not meant to replace texts books it's to help you it's meant for educational purposes only it is meant to improve your knowledge about these different artifacts so take it as it is all the code is available on github so that you can check it online you can go clone it run it locally you can even use it to deploy it on heroku with your account and if you have ideas on how to improve this application if you want for example to add more models or more sophisticated data set feel free to push a pull request or create an issue and i will be more than happy to discuss it with you okay now that we see how this app works how we can use it let's have a look at the code understand how these different filters and different graphs has been created so that you can understand how you can manipulate a streamlined code and how you can create also your own projects okay see you in the code section okay now see if you want to start this application on your local computer and start tweaking it or maybe adding new features in development mode i encourage you to go to my github repo and clone this project locally so once you've done that you'll have the structure of the project and you will see inside the structure two files pip file and pip file.log these will handle the dependencies of your project so to replicate the same virtual environment i've used i encourage you to have pip installed on your machine and once you have it you'll go to my github repo and then you'll have to launch this command basically i have run this command i don't have to run it again and you'll have to do it to install and create the virtual environment with the different packages i'm using different packages in this project so let me show you i'm using for example psychic learning extremely jupiter etc and if you want to have these same packages i encourage you to install them as well okay now that you have installed your packages you'll have to start the application so the command is star streamlitron app.pi and this will start a local server on this port 8501 okay so let's now have a look at the global structure of the code open up the project in your favorite code editor i'm using vs code so what we see here is the two files i've mentioned there are other files procfile and setup.sh we will see these files later on they will help us deploy our application and then we will have a structure with app.pi which is the main script that will start our application and this script depends on additional scripts inside the utils folder so we have two scripts here the y dot by and the functions dot pi the ui.pi is a ui functions that will be responsible of displaying different ui components of the app so let me show you here for example this is a ui component this is a ui component this is a ui component as well so these are different components that can be handled separately in individual functions and i put these different functions inside this ui dot pi for more clarity inside functions.pi we will see we have additional files to generate the data after selecting a data set we will have a code to plot the decision boundary and create the plots that we see here and then we will have also a function to train the model and return the train matrix and the test metrics as well as the duration we'll have additional information such as transforming an image path to a binary object we'll have some information that some function that will retrieve informations about each model such as the tips the documentation as well and these information are inside the models folder that we will have to see shortly all right as i mentioned there is the ui script that contains a different ui function so for example the introduction is inside this function okay we will have also a dataset selector which is this part inside the collapsible object we will have also the model selector inside the collapsible section over here and we will have the generate script method that will generate the scripts of our model as well as our data set all right so we have also the footer that displays the sex this part and we will have the polynomial degree selector here so if you want to break this application into two big parts we will have the sidebar here and the body part here and that's how i structured my main file if you look at the main file you will see that inside my main function i will have two function calls the function the first function calls is the sidebar controllers that is responsible of displaying all this information and i have a body called body function that will have these parameters to handle the display of these different elements so the sidebar controllers is dynamic if you choose for example a specific dataset like for like the blobs dataset we will have a specific option here that we won't see on the two other datasets if you select a specific model you will see its respective hyper parameters and these hyper parameter changes according to each model obviously so we want to take into account this dynamic feature and this is implemented inside the sidebar controller all right once we select a model a training and a data set a training is executed automatically on the data and the training is outputted by a decision boundary as well as some metrics some function will be called to display this decision boundary and these two charts and another function will take this information and display it on the ui so we will see this in more detail okay so as i said let me show you the first sidebar controller so the first sidebar controller is it will take the data set selector so the dataset selector is an object i created in the ui script i will show you what it does okay so let me put this side by side so inside the data set selector we will have a beta expander that will allow us to collapse our data set here and inside this expander i will put a select box and choose a data set from a list of three options and then i will have to set the number of samples by a number input here i will have to set the train noise and the test noise with the slider here and here and then if my data set is a blobs data set i will also add the number of classes otherwise the number of classes is known then i will return data set number of samples train noise test noise and finally the number of classes these objects are returned here and then i will call inside the sidebar controller the model selector so if you if you look at the model selector it's basically the same thing you'll have to create a beta expander to collapse these information actually this is fairly optional it's only for layout purposes so inside this expander we will have a selector select box which is choose model and you'll have to choose from this list of different models and once we select a given model we will have a model that will be returned by a method and for each kind of model type of model we will have a specific parameter selector that will return this layout as you can see this layout here depends on the model so for example a decision tree has a max depth criterion mean sample etc this logistic regression has different parameters different number of parameters so to take this into account i created for each model a specific python file that will plot this information so i put everything inside a model folders for each individual model i created the file so let's see an example for the logistic regression i will have a select box with these different solvers penalties and the c parameter and finally once we set the different hyper parameters i put them inside a params dictionary and then i will put this under all these different hyper parameters inside the model and then finally return the model so basically i'll have to do this for every given model i and think for simplicity it's the best way but if you have other recommendations regarding how to do this in the best way feel free to post something in the comments so let me show you just another example if i want to build the same thing for the random forest so i have to do basically the same filters return the parameters inside the params dictionary and then put these parameters inside the model okay so now back to the ui once we return the model from the model selector we will have the model type which is the name of the model and the model object with the specified hyper parameters now our model is not trained yet we'll have to generate some data and this data will be generated given the data set that we have selected the number of samples the train noise and the number of classes so let's have a look at this generate data function so this generate data function is inside the functions dot by it's not a ui functions it's more of a utility function so basically this function takes these different arguments if the data set is the moon's data set it will have to call make moons from circuit learn otherwise it will have to call make circles or make blocks which is fairly easy and then finally it will return the data train data the x and the targets the corresponding targets okay so once we generate the data here we will add for the site controllers the feature engineering which is a simple number input and then finally we will display the footer in here okay so now that we have accessed all the data we needed from the sidebar controllers we have to input these parameters inside the body function that will handle the display of these different results okay so inside our body function we will first call the introduction which is a static layout of our application and then we will create two columns with the beta columns method from streamlit and basically these two columns are here and here the first colon will hold these graphs and the second columns will hold this information the snippet the duration as well as the tips so these inside these columns i will put placeholders basically placeholders are a way to implement dynamic data so at the beginning plot placeholders will not hold any data they are empty containers and once the information related information are updated they will get re-rendered okay so i will call a plot placeholder inside the first column and this placeholder will contain all this information inside one graph inside the second column i will have different placeholders put the one one after the other a duration placeholder a model url a code header placeholder so this corresponds to this different section basically etc so we have six placeholders that correspond to these six sections okay now that we have created and defined our skeleton our architecture i will add the polynomial features if any so if the degree is still one we don't do anything otherwise we increase the features by adding polynomial features and then i will fetch the model url from a model type so i didn't haven't showed this to you but inside the models folder i have also a utils.pi which is all the metadata i need to display for each model like for example the model imports the modal urls or the model information forms of snippets so i get my model url here then as i have my data that has been generated inside the sidebar controllers i will train my model okay i will train my model to get the train accuracy the test accuracy the train f1 and the test f1 as well as the duration so once i have this information i can put it inside the matrix object and then with this same information here i will generate the code snippet i will also get the model tips based on the model type and then i will plot the decision boundary and metrics and put them inside the plotline figure and once i have all this information i will put them dynamically inside the different placeholder so that my layout will update every time it changes something here okay so for example i will put the plot okay i will put the figure inside the plotly the plot placeholder i will put the duration inside the duration placeholder and so on and so forth okay so i haven't shown shown you the train model function let's let's see it okay okay basically the train model function will take a model as input they train the x train x y train as well as the x test and y test it will fit the model calculate the duration get the prediction calculate the drain matrix as well as the test metrics and send all these results back so pretty simple now for the generate script it's inside the ui and basically it will get all these parameter parameters if the degree is higher than one it will perform some future engineering if once dataset is selected it will import it accordingly from scikit-learn and then it will concatenate all this information inside the snippet and this is returned in for in form of a string but to display it in form of a code you'll have to call the code method from streamlit okay so now the last thing maybe the most important one and how the the thing that has the most value in this project is the plot decision boundary and matrix so plot decision boundary and metrics is a function that will be responsible of displaying the decision boundary as well as the different metrics on test and chain data so this function will be called inside functions.buy and it will take the model the train uh and test data as well as the dictionary of metrics so for example first of all it will start by computing the x minimum and x maximum as well as y minimum and y in maximum to to scale the boundaries of the decision boundary plot then it will compute a grid of x and y and for each x and y it will predict the class of the corresponding point so this will be a way to visualize the color that we see here in fact if you see an individual data point it's a combination of x y these two elements are integrated inside the model and this model predicts the output and the output is either class one or two and if it's class one it's red if it's class zero it's if it's class one it's green and if it's class zero it's red so i won't bore you with the details if you have if you're interested in how to play and display this function you can have a look at github if you don't understand something feel free to contact me either on github or by posting a comment in this video alright so that we have everything wrapped here everything communicate with each other we have plots we have graphs everything is a dynamic the key idea here is to create placeholders so that your data can refresh according to the changes you made here if you have ideas on how to improve this app if you want to make it better faster or even with more features feel free to reach out okay so now that our application works locally now that you understand how to put all these different ui components together let's see how to deploy this application on heroku so that it will be accessible on the internet and everyone can use it so to deploy our application our stimulate application on heroku we need to configure three files first of all the first file is the requirement.txt which is the list of requirements of our projects if you want to generate this automatically there is a quick way to do this by calling the pip bricks script and basically to have this you'll have to install pprex beforehand by using pip and once you have pprx installed you'll have to go inside your projects and run bpex and this will generate a requirement.txt automatically all right so this will generate our first file then the second file is the the setup.sh so basically there's nothing to do here rather than copying paste pasting this file it will configure some parameters on the server side on hiroku like disabling cores setting a port etc so copy paste this and finally the proc file which is the the way uh heroku will start your application so same thing you'll have to put the name of your entry file here and keep the other as it is so once you have this you have to put your project inside a github repo for example and then you'll have to create an account on heroku so for example i have already deployed my application which is accessible on platform.ml.hirocoap.com but let's do this for another with another name so you'll have to go here after creating an account you'll have to create a new app so i'm going to call it playground ml to i will select the region and then i will create the app and then once you go here they ask you about the deployment method so we'll have to connect to something like for example heroku cli or container registry but i'm going to use github because my code is on github and i will put the repo name which is play ground in my case okay so i can connect my github then i they will ask me about the branch the branch to deploy all my code is on the main branch so i won't change this and then finally i can deploy my branch by pressing this button you can enable heroku to automatically deploy your branch once you push changes on the main branch but i won't do this so once i press deploy branch so it will start building the container image of the dependencies of my project and then it will deploy it so this is maybe the time to grab a coffee and come back when this when this is done well after a few minutes of building the app is finally deployed and you can check this by pressing this button and indeed it's deployed we have playground ml-2 heroku.com and if we test the application it works if i select another model let's say decision tree it changes and the app seems to be quite fast as well all right so we've seen in this project how to put in place a streamlight application how to compose the ui with different dynamic data layouts graphs snippets and so on we have seen also how to deploy this application very easily on heroku i hope you guys like this video i hope it will be beneficial to your project if you want to improve playground feel free to do it you can create pull requests issues if you have problem running the code don't hesitate to communicate an issue or send a comment in the comment section uh i hope you guys enjoyed this video if you like it please consider sharing it to your friends or colleagues or maybe hitting this like button thank you guys for watching again and see you in next videos
Info
Channel: Ahmed Besbes
Views: 753
Rating: 5 out of 5
Keywords: streamlit, interactive web app python, data visualization streamlit, streamlit tutorial, deploy streamlit on heroku, python web app, machine learning app, heroku, ahmed besbes, interactive visualization with streamlit, streamlit python, streamlit python tutorial, streamlit dashboard, streamlit web app, streamlit app
Id: htKmCWrFYr8
Channel Id: undefined
Length: 35min 5sec (2105 seconds)
Published: Mon Apr 05 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.