Build A Beautiful Machine Learning Web App With Streamlit And Scikit-learn | Python Tutorial

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey guys welcome to a new tutorial today I want to show you how we can develop an interactive machine learning application with streamlet streamlet is a free and open source framework to rapidly build machine learning web apps I really like this framework because it's super easy to get started even for beginners and I think the UI looks beautiful so let me show you the application that we were going to implement so it's a simple machine learning app that lets you explore different classifier and different data sets for example here you can select the iris dataset the breast cancer data set or the wine data set and then you can select different classifier and also update the parameters for a classifier and then here you get an overview of the data set and your classifier and then the accuracy and you will also get a plot of the data set and then you see that whenever we change something here then this will be updated so this is interactively so that's what we are going to build and now let's start so in order to install streamlet we can simply do this with pip and we say pip install streamlets so I already did this and then we also need scikit-learn and map plot lips so we also have to say pip install scikit-learn and also pip install matplotlib so as a bonus you will also learn here how to use the scikit-learn framework and implement different machine learning classifier pipelines so let's do this so first I recommend in order to explore streamlet simply say streamlet hello after installation and this will fire up the basic example app so now if we go to a local host where this host that then we will see we have different examples for example I can choose the plotting demo and then we see this demo and when it's loaded we will also see the code so then you can have a look at the code and explore this so I recommend that you check out these four examples and now let me close this again so now let's stop this server here and let's start implementing our own app so in order to get started we have to import streamlets so we say import streamlet s st and then for example we can set a title by saying st title and then give it the title so let's say streamlet example and now we can save this and now in order to run our own script we have to say streamlet run and then the name of our file so in this case it's main to PI and now when we hit enter we will see that our app is hosted on localhost 8500 - so let's open this and then we will see here we have our app up and running so this is working and by the way we can also activate hot reloading so we can click here and then on settings and then run unsafe so this will update our application whenever we do a change here and save it so let's continue so after we set a title we can also for example simply write some text so we can say st dot write and here what we can use we can use smart down language so we can for example use a h1 tag and use a heading so we say Explorer different classy fire and now when we save this and go back then we see we already have this here so this is a h1 heading and we can also write normal text of course so we say which one is the best and then save it and go back and then we see the text appears here so now we can see how we add text and now we also can add different widgets for example let's add a select box so we can say streamlet and then dot select box and then here first we can give it a title so we say select data set and then here we give it the different options as a tuple so here for example we want to have the iris data set we want to have a D or D breast cancer data set and we also want to have the wine data set so these are all popular data sets for machine learning beginners and now if we go back to our streamlet app we see that we already have this selection box where we can select iris breast cancer or wine and then what we can also do is we can assign this to a variable so we say data set name equals this and then whenever we choose a data set in the select box then this will be assigned to this variable for example we can say streamlets rights and then the data set name and then we see that this will be updated so now if I choose wine data set we will see we have the text here so whenever we do a change with one of the widgets the whole script will run again and then that's why we get the update here and now if you ask yourself isn't this super slow if the whole script is running again and for that I can tell you that streamlet has some intelligent ways to cache these things but I will not go into this in this tutorial but you should know that you can cache some some things that you already computed and then the whole strip doesn't have to run again but for now this is fine so whenever we do a change then this will be updated and we can also easily move this to a sidebar so for this we can say here streamlets dot side bar and then dot our select box and now we see we already automatically got a sidebar so isn't this beautiful and so now we get our data set select box so now let's move on so now what we also want to have we want to have different classifiers so let's create a select box for this too so let's say classifier name equals and then let me copy and paste this so this is also a select box at the sidebar and here we say select classifier and for this example I want to use three popular machine learning classifiers so this is the K nearest neighbor algorithm the SVM algorithm support vector machines and also let's use random forests so I will not explain them in detail but if you want to learn more about these then I can recommend that I have a whole playlist about machine learning algorithms from scratch so I will put the link in the description and you can check this out so now we get the different classifier so let's have a quick look so here we have K&N SVM and random forests and now let's load the data set so for this we are going to define a function so let's call this get data set and this will get the data set name and based on this data set name it will load different classifier so for this we are going to use the SK learn library so you have to install this as I said in the beginning and then we can say from SK learn we import data sets because we have some of them already included in the library and now down here and our function we will check if data set name equals equals and then we compare them with our names here so if this is iris then our data set or let's simply call this data equals data sets dots and then we can call load iris so this is already included in SK learn and then we say L if data set name equals equals and we also have breast cancer then our data equals data sets dot gets no it's called load breast cancer and otherwise we say our data equals data sets so this is the last option and here we say load wine and then we will split this into X and y so we can say x equals data dot data and y equals data dot target and then we simply return x and y so now we load the different data sets so let's call this function here so we say x and y equals gets data set and then we put in the data set name so as I said we assigned this to a variable here so we can use this here and now let's simply write some information so that we can see that this works so if we say streamlet dot right and then let's say we want to have the shape of the data set and this is here we can use X dot shape and we also let's say we want to get the number of different classes in our data set so for this we say streamlet dot right and then number of classes and then we say we want to get the length of numpy unique and then Y so this will put in all the unique values for example if we have class is one two and another two then it will only output one and two and then the length of this is two so let's save this and yeah if we see we get an error here right away which is helpful name and P is not defined and for this of course we have to import numpy so we say import numpy s and p and this should be installed when you install scikit-learn by the way other ways you can also install it with pip so now if you reload this I think now I have to hit enter so now we see the it's working so now let's choose for example the breast cancer data set and then we see the shape got updated and the number of classes so it seems to work so now let's continue so the next thing we want to do is to get different parameters that we can modify for each of the classifiers so what we do down here is let's create another function let's say define and let's call this at parameter UI and this gets one parameter so this will be the CLF name the classifier name so based on this we will output different parameters and we will also put them in the UI so let's create an empty dictionary here and then let's check our parameter or our classifier name here so if CLF name equals equals and the first thing is the K&N so here we say our k equals and then we add this to the side bar so we say streamlet dot side bar dot and then we use another widget and this is the slider widget so we give it a name that is K and then we will give it a start and a stop value so for example I say K let's do from 1 to 15 and we could also give this a type so if it is integer or float but streamlet can also figure this out on its own if the datatype is clear so i will leave the datatype out here and then we will put this in the dictionary so we say power amps and then as a key we say the key is K and then we put in the K so let's return the parents down here and then let's also call this parameter UI with the classifier name so that we got up here so classifier name and now let's see if this works so if we reload this so I think sometimes we have to reload this manually but yeah so now we see we have our slider widget which ranges from 1 to 5 and then we also see when something when sometimes when we update the value here we quickly see that it's running so it's also running our script again so this works so for the K King K and n it's very easy we only have the K the number of nearest neighbors that we want to have a look at right now and then let's also do this for the other two classifiers so we say L if CLL CLF name equals equals SVM and then we do the C parameter so I recommend that you check out the official documentation so as let's google SVM scikit-learn and then you can see that one of the most important parameters is the C value so let's do this so let's do the same as we are doing here but now we call this C and we create a slider for this and in this case we want to have the range from point 0 up until ten point oh so now it figures that this or it can figure out that this is a float value and then we also do this or put this into our parameter dictionaries so params C equals C and so I'm only going to use this one and I can recommend that you check out the official official documentation and do this on your own for the other parameters but for now this should be fine and now as a last classifier we do it for the random forest parameter random forest classifier so one important parameter here is the maximum depth for our trees so we say max depth equals and then we again use a slider widget so let's call this max depth and this should range for example let's say from 2 to 15 so minimum two trees and maximum 15 sorry this is not not a number of trees but the number of depth for each tree but we also can figure can configure the number of trees so for this we say number of estimators equals and then the same is here again a slider widget which we call the number of estimators and then let's say this should range from one to a hundred again I recommend that you check out the documentation so if we Google SK learn random forests then you will see all the different parameters and for now we only use this one and this one and then of course we have to put this into our dictionary so we say params and as a key recall this the same as we called it the variable so max depth equals max depth and params and then the number of estimators estimators equals and estimators and then we return the dictionary at the end and then we should have those parameters so as a last thing that we must set up is the to create the classifier so now let's first let's save this and let's have a look at our app if this is working so when we have two K and n we have the K parameter from 1 to 15 when we have SVM we have the C parameter from point zero one up until ten so this is a float value here and then for random forests we should get two sliders so yes so one is the max depth and the other one is the number of estimators so this works and let's continue so now we want to get or create the actual classifier so let's create another function gets classifier and then again based on the classifier name and also based on the params so here this will return the parents so we say params equals and then we call the function and down here we do the same as here so the same [Music] if statement and LF statement and here we want to create the actual classifier and for this we have to import them from SK learn so we do this up here so we say from SK learn and then it's called ney purse and then we want to import K mmm okay sorry K neighbor's classifier I'm wondering why they also complete is not coming up and but nevermind so we also want to have SK learn SVM import and this is called SVC so support vector classifier and we also want to have two random forests and we get this from SK learn dot and small saw Bo we want to import it's called random forest classifier I hope that I don't have a typo now but let's try it out so down here we want to create them so for the K and n we simply say our CL F equals and then K neighbors classifier and as a parameter we it it gets n underscore neighbors equals and then here we say parents with the key K so this is our K and n and then for the SVM we do it with the SVC and here sa parameters see we use the param swith key c and as a thing we want to set up the random forest classifier so here we say CLF equals and then random forest classifier and this will get the number of estimators equals and then the parents with the key and s T mater's and let's do also the max depth equals params with the key max depth and we also put in a random state so we can reproduce our results because as the name says this is randomized so let's put in one random state one two three four whatever you want and then we have our random forest classifier so let's let's simply return this so then we call the function here so we say our CL f equals get classifier and here we put in the classifier name and the params so let's see if this is working so let's reload this and we don't have an arrow here which is good so yeah here we do have an error let's try out this one we also do have an arrow here so let's have a look at this one first so we don't have C defined in line 57 parent C equals C so yeah we don't want to update this here so now let's save this and let's run this again now if you select SVM and this is working and if we select random forest we still get an error in line there are multiple identical ST slider widgets with the same generated key so yes this is because we don't want to have the sliders again here we already create them up here so here we simply want to create our classifier so again let's save this and let's run this again so now if we select random forests then we see this is not producing an error so this looks good so now we can continues and now we can finally do a classification so let's say plessy fake Asian and for this we want to import one more from SK learn so we say from SK learn dots model selection we import and this is called train test split and this will split our data into training data and tests data so now let's do this so down here we we call this so we say X train and X test and then Y train and Y test equals and then train test splits and then here we put in x and y so we already got this here so we want to split this data and we can define the ratio by saying the test size should be point two so 20% of the data is used for testing and the other one is used for training and we can also give them give this a random stay to reproduce our algorithm so let's say this is 1 2 3 4 and then to train our classifier we simply have to call CLF . fit and then X train and Y train so the API is the same for all the classifiers in the SK learn library so this is pretty nice and then after fitting we want to call the predict method so we say Y predict equals CL f dot predict and then here we use the X test data and then we have the predictions and now let's create the accuracy so for this we also want to import this measure so we say from SK learn dot metrics we import and this is called accuracy score and then down here we can use it this so let's compute this so let's say accuracy equals accuracy score and this will need the actual label so this is called Y test and then the predicted label so I predict and then let's write this to our app so let's say streamlet dot write and then let's use an extra year for this you need python 3.7 or 3.8 and then let's simply first write our classifier name so let's say classifier equals and then in curly braces we can use a variable here so we used the classifier name and let's also write the accuracy so another F string and then we say at Q Rho C equals and then in curly braces we call the AK variable so the accuracy and now if we load this again then this should be here so now we can explore the different classifiers so for example here we see with k equals 7 we have a accuracy of 0.96 and for example with 10 K nearest neighbors we have accuracy as 1 point s one point zero so perfect and yeah that the iris data set is pretty easy so we should get a perfect or almost perfect accuracy with the different classifiers and then we can see here and so let's choose the breast constant data set so this is a little bit more complex and yeah so now you can play around and see which parameters are the best and now is the last thing we also or I also want to show you how we can plot the data set so down here let's say plots as a comment and for this we use a little trick because our data set can have more than two dimensions but we can only see this in a 2d plot so for this we say from SK learn dots decomposition we import PCA so this is the principal component analysis algorithm and I already also show this in my machine learning from scratch tutorials so this basically is a feature action method that will transform or protect our features to a lower dimensional space so so that we can have it in 2d so here let's call this so let's say PCA equals PCA and then we have to specify the number of dimensions that we want to keep so we say 2 for 2d and then we can get the projected data by saying X project that equals PC a dot fit transform so we can do this in one operation and here we put in the whole data X and that's it so note that we don't need the Y here so this is a unsupervised technique which doesn't need the labels so this is a very important to remember for PCA and now let's plot our data so we also have to import matplotlib of course so let's say import mat plot lip dot pi plot s PLT and then we can use this so you simply want to create a 2d plot so let's get the first axis by saying X 1 equals x projected and then we use slicing so we want all the samples but only the dimension 0 and X 2 equals the same and I want dimension 1 then we create a figure so figure equals P Ltd figure and then we say PLT dot scatter and here we want to scatter X and x2 and then we also want to plot this in different colors based on the labels so we say C equals y and we also can give it a alpha let's say point A to make it a little bit transparent and then let's also define a color map equals via read this so this is a built in color map in method lip that looks pretty nice I think then let's also give this a label so let's say PRT dot X label X label equals and then let's call this principal component one and let's copy and paste this and do this for the Y label so this is the principal principal component two and then let's also plot a Calabar so PRT color ba and now instead of calling PLT shows so this is what you would do with normal code and we do st dot pi plot so this is also built-in in our streamlets application for a framework and then if we reload this we should see the plot here so we see it's working and we have the iris dataset with for example of four dimensions transformed into only two and plotted here and now if we select a different one we see that also the plot is updated and let's do it for the wine data set as well and yeah so we see this is working and now we have a nice little interactive machine learning web app and yeah I hope you enjoyed this and now if you want to if any assignment which you can improve on your own so then I can give you some more tasks so for example one thing that you can do is you can add more parameters so as I showed you in the SK learn library and there are more parameters available for most of the classifiers so try them out as well then you can also of course add more or at other classifier and as a last thing and also recommend to play around with feature scaling so add feature scaling you also have a lot of methods available in SK learn framework so this is what I would recommend that you can try out on your own and that's all I wanted to show you for today I hope you enjoyed this tutorial and if you like this then please subscribe to the channel and see you next time bye
Info
Channel: Python Engineer
Views: 59,939
Rating: 4.9768519 out of 5
Keywords: Python, Python Tutorial, Streamlit, Machine Learning, ML, ML Web App, scikit-learn
Id: Klqn--Mu2pE
Channel Id: undefined
Length: 38min 45sec (2325 seconds)
Published: Wed Jun 03 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.