Build 12 Data Science Apps with Python and Streamlit - Full Course

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

I really enjoy his tutorials. He has a bioinformatics series that helped me produce a great project.

👍︎︎ 2 👤︎︎ u/dnagirl71 📅︎︎ Jan 07 2021 🗫︎ replies
Captions
welcome to the beginner's course on how to build 12 data apps in python with streamlit my name is and i'll be your instructor for today some of you might know me as the data professor from my youtube channel data professor where i teach data science machine learning and also provide a bioinformatics project walkthrough aside from being a youtuber i'm also an associate professor of bioinformatics where i teach and do research at the interface of machine learning and computational drug discovery in this course we will be building 12 interactive data-driven web applications in python using the streamlit library so streamlight will allow you to make use of data and also all of the python libraries such as numpy scipy maps.lib seaborn right inside the python environment and you'll also be able to create an interactive web application that will be able to pre-process data sets visualize the data and also make prediction from machine learning in the form of a web application so a basic working knowledge of python is assumed but don't worry i'll be holding your hand and provide you with a step-by-step walkthrough where i will be trying my best to simplify the concept and topics covered before proceeding further let's take a look at the 12 data web applications that we will be building today i've also rearranged the topic according to the topics covered and so in apps number one and eight we will be building a very simple stock price application app number eight we will be building a simple bioinformatics dna count application afterwards we will be building four eda applications using the basketball data football data s p 500 stock price and also the cryptocurrency price data we'll also be developing two classification models and embedding that into the web application for the iris data set and also the penguins data set we'll also be building two regression models on the boston housing data and also the bioinformatics solubility data and finally we'll also be showing you how you can deploy your application to the heroku platform and also to the trimlet sharing platform and so for more data science machine learning and bioinformatics projects please make sure to subscribe to my youtube channel the data professor and also follow me on medium where i regularly publish blog posts on data science and also machine learning so links to all of these are provided in the description of this video also grab yourself a cup of coffee and without further ado let's get started have you ever wanted to build a data-driven web application for your data science projects but perhaps you might be intimidated by the difficulty of coding in django or in flask if you answered one or all of the above then you want to watch this video to the end because i'm going to show you how you could build a data-driven web application in just a few lines of code and so without further ado let's get started so the name of the python library that allows you to build a simple data-driven web application is called streamlit actually this python library was brought to my attention by one of the subscribers of this youtube channel so please give a big hand to iqbal for recommending this excellent python library that will allow you to develop a simple data-driven web application for your data science project and so the first thing that you want to do is head over to the streamlet website by typing in streamlit.io and so i'm going to provide you the link in the description of this video so this is the website of streamlit and as you will see it says that it is the fastest way to build a data application and so here you can see that you could build a opencv web application from within streamlit and you could add a lot of interactive elements as well so in order to get started you want to install streamlid and so you could do that by typing in pip install streamlit and after the installation process is finished you could type in streamlit hello in order to check that it has successfully been installed and as you can see here a simple web application could be built in just a few lines of code and you will see in this second example that you could also add widgets to the web application as well and so this slider widget will allow you to select numbers just by sliding the slider bar and in this third example here you could deploy your web application easily using git and there you have it a minimal framework for building a powerful web application while just requiring you just a few lines of code and so here are some of the gallery of web application built using streamlit so let's have a look at the gallery okay so this awesome web application using tensorflow was built in streamlit and there are other awesome examples of streamlit applications that were built by the user community and so here are just a selection of these so if you have built a web application using streamlit you could also share it via twitter and the streamlate website will be showcasing your web application in this gallery page so you can see here that a wide variety of web applications have been built using streamlit okay so now that we have a brief introduction about streamlit let's have a look at how we can build one for ourself okay so the first thing that you want to do is fire up your terminal so if you're using a microsoft windows you want to type in the search bar cmd and then you will see a terminal prompt coming up and in this terminal prompt you want to type in pip install and then stream lit and then hit enter and since i have already installed streamlit so i'm going to proceed with showing you how you can build the application so i installed streamlit inside the contact environment and so i want to activate my environment by typing in conda activate dp alright so i've created a python file called myapp.py and the contents of the file is shown here so you can see that it is approximately 20 lines of code so if you deduct the empty spaces then it should be less than 20 lines of code and so aside from installing streamlit in this example you also want to install why finance so you could type in pip install y finance okay and after you have done so then you wanna type the following lines of code in but for your convenience i'm gonna share you the link to this file on the data professor github so you want to check in the description of this video and download this file okay so the first three lines of code are just simply importing the y finance as y f import streamlit sst import pandas as pd and then this block of code we're going to write the header of the web application so as you will see here that this is in markdown language and with the hashtag here it is indicating that this line is a heading type one so it's going to be a big text and then it's going to be an ordinary text saying shown are the stock closing price and volume of google and then here in this blocks of code i've taken from the towards data science article so you want to check that article out and give this article a clap and so i extracted some lines from this article and so this line of code will be the ticker symbol of google and so it is g-o-o-g-l and then in this line of code we're gonna take in the ticker symbol of google and so we're gonna retrieve historical data of google stock price with a period set at one day and the starting date is may 31st 2010 with a ending date of may 31st 2020 and then we're gonna save this into the ticker data frame and then the contents of this data frame will comprise of the following columns open high low close volume dividends and stock splits okay so in this web application we're gonna show you two line chart and we're gonna show the closing price and also the volume okay and so this is a very simple web application and you could customize this to your own liking so i'm gonna show you that we could also edit the contents of the file and the web application will be serving the updated version in real time alright so let's type in cd desktop because this file is on the desktop and then i'm gonna type in streamlit run and then the name of the application which is my app dot py enter and that's it okay so it's gonna spawn up a web server and this is what you're gonna see a simple stock price application okay so let me show you side by side the code and the application all right here so here the simple stock price app here is the heading and since it is one hashtag it means that it is having the heading one style h1 in html language but if we have two then it will be a bit smaller so let's save it and then it detects that the source file has changed and then we should select always run and then we should select always rerun and so it's going to update to be a bit smaller as you will see here and if i add additional hashtag and save it it will be even smaller okay so i'm gonna change it back to one hashtag and so it is heading one let me try okay so let's maybe modify this a bit in markdown manner so closing price i'm gonna make it bold volume i'm gonna make it bold and an italic see so you could customize the style in markdown style so you want to refer to the markdown cheat sheets and i think this is a good one to refer to so the markdown cheat sheet by adam pritchard and it has everything that you would ever wanted to know about markdown alright so you could add cool stuff in here you could add lists unordered lists you could add links you could add images you could add tables okay like for example let me copy this and then i'm gonna just put it here all right so it's a table okay very neat right okay but let's delete it for simplicity so the ticker symbol is google so we can even customize the ticker symbol to be other values as well so let's say aapl for apple save it and then this is the price for apple so i could update this like that okay as you can see it will update automatically to the website okay so this is the date range that are shown in this line chart and this is the actual line chart so if i wanted to delete one of them and then save it then only one will be shown okay and let's say that i wanna write something in okay so i'm gonna write here the heading one closing price maybe make it heading two all right and i'm going to do the same for the volume and see we'll see this is the customized version okay so if you zoom into mail you could even do that as well this is a interactive chart and if you want to zoom to the original version you would just double click on it right same thing here just double click on it and it'll go to the original state and so there you have it a data driven web application in just a few lines of code do you want to build a bioinformatics web application if you answered yes then you want to watch this video to the end because in this video i'm going to show you how you could build a very simple bioinformatics web application in python and without further ado we're starting right now so the bioinformatics web application that we are going to build today is called the dna nucleotide count web application okay so let's fire up the terminal and let me activate my conda for python click on the activate and the environment is called the dp dp is standing for data professor and so let me go to the folder where i have my streamlet web application files cd desktop cd streamlit cd dna okay and so we have a total of three files here and so the aromatase.fasta is a example data file but actually we're not using it to build the web application so essentially we're going to have only the python dna dot py and also the dna logo.jpg and the dna logo will be displayed in the web application right here online number 14. so let me fire up the web app so i'm going to have to type in streamlit and then run and then the name of the app which is dna dash app dot py okay and so for those of you who don't have streamlight installed you could install it via pip so you could simply do a pip install streamlit okay and so the web application is right here so this image here is the logo and the logo looks like this let me show it here the logo is right here so for this logo i've drawn it using the good notes application in ipad alright so let's have a look at the web application and let me show it side by side [Music] okay let me also increase the font size here all right there you go so it's bigger for you guys and right here too okay there you go so it's a lot bigger now okay so let's take a look at the code here so the first couple of lines here we're going to be importing the necessary libraries for this web application so on line number five here we're going to make use of the data frame from the pandas library and the basis of this web application we'll be using the streamlit library so we're going to import streamlit as well and for the graph here we're going to make use of the alt air or altair library and then for displaying the logo we're going to have to import from pil import image okay so the block of code here from line number 10 to 24 we're essentially going to show the dna logo so we're going to create a variable called image which contains the name of the logo and then we're going to display the image here and then we're going to display it by allowing the image to expand to the column width here so it will expand to the column width and then we're going to print out the header here dna nucleotide count web application right here shown in bold and then we're going to have a short explanation this app counts the nucleotide composition of the query dna and then the three asterisks here will be showing the hr line so the hr line is essentially the horizontal line here okay and so the next block of code here will be showing the text box here the dna text box here enter dna sequence so st.header will be showing the header here enter dna sequence which is right here and then the text box here will be displayed using the st.text area okay and then we're going to have the height to be 250 so if you want it to be a bit smaller then you could adjust the number here and so you can see that when the height is 150 the height is smaller and if it's 350 it'll be bigger so let's have it at 250. okay and sequence input here this is the sample dna sequence shown here so if i modify the name here the name will be modified here and the slash n here is the new line so if i don't have it here then there won't be a new line okay like for example you could you know import your dna sequence [Music] and then command enter and then you're going to get a different output okay and so upon reading in the sequence here st.text area the input here in the text box will be provided or assigned to the sequence variable and then for the sequence variable we're going to split the lines so each of the line here will be split okay by splitting it means that it will create a list of each of the lines so the first member of the list will be dna query and then the second line will be the second member of the list third line will be the third member and the fourth line will be the fourth member of the list and to provide this even clearer let me show you not there but here all right right here so you're gonna see here that when we use sequence and then we use dot split lines we see that each of the line here line number one line number two line number three line number four will be shown here as four members here of the list one two three four okay and the lines will be indicated by the backslash n which is the syntax for newline which is also right here so this is the equivalent of pressing the enter key on the keyboard as so like that okay that's a new line all right so let me take that out and so we're gonna skip the first line here sequence one colon which means that we're going to read lines number two onwards because the first line will be the name of the sequence right here which we don't want because we want to compute the dna composition using from lines number one onward i mean index number one onwards which corresponds to the second member of the list the index of one onwards so that means index number one two and three until the end right because the n is line number three index number three okay so line number forty will essentially skip the sequence name because the index of the sequence name is zero so it's gonna skip this and we're gonna slice and select lines number one i mean index number one onwards so here we're gonna slice or select in the bracket here index number one onwards meaning that index number one here onwards until the end and it has a total of three additional lines index one in x2 and x3 and then it will be assigned to the same name sequence okay so let's show it here again as you can see the name is now gone and now we have only the sequence and we have a total of three members in the list now and before there were a total of four members including the sequence name as the first position but because we have already selected the second member onwards right here second member onwards second third fourth it means that word this it means that we're discarding the sequence name and so here we have only the sequence and so it's very helpful to have a look at the contents of the variable line by line so that you can see how the input data is being modified accordingly for each line of code here okay and notice that we have a total of three members in the list and so for the next line of code here right here we're going to join the three lines here together in order to form a long stretch of dna sequence let me show you what i mean all right here so you can see that using the dot join sequence and then because here we're saying that we don't want any space in between so we're gonna get one line of sequence here okay so the three lines shown here will be combined into one and notice here that we have two quotation marks if i add a space here there will be a space added right here space added here space edit here okay or we can even make it a new line for example to show you how the data is being modified okay and becomes a new line or if we say we want to add a dot it will add a dot right here see it will add a dot here and it will add a dot here so we don't want to add anything here we're gonna just have the two quotation marks close together and it will be one long stretch of line here okay and now we have the dna sequence that is pre-processed and ready for computation so let me delete this again all right let's take a look further so here line number 46 is the comment for prince the input dna sequence so the dna sequence that we see here will be printed here in the input header dna query so this is the input that we have already pre-processed by joining all of the lines together so the three lines will be joined so as you can see initially we read in the dna sequence here and then we deleted the first line by discarding it where we only take into consideration the nucleotide information here and then we join it together into a single line as shown here and now let's proceed to line number 50 dna nucleotide count heather is output dna nucleotide count right here so under this heading we're going to show you the four different ways that you could display the output so before displaying the output we're going to do some computation so in the first method here print dictionary all right so we're going to use the subheader so the subheader will be slightly smaller than the header the font size will be slightly smaller all right and here lines number 55 to 62 we're going to create a custom function for counting the dna nucleotide from the sequence so here def and then the name of the function and then the input argument seq and then we're going to create a d variable which is a dictionary and then it's going to contain four members here atgc and for a we're going to count the number of a in the dna sequence and for t we're going to count the number of t using the dot count function same thing for g same thing for c and then at the end it will return the d variable and so the d variable will be a dictionary containing the name of the nucleotide atg or c along with the count of each of the nucleotide so essentially it will be a dictionary like for example a 59 t 43 g 52 c 56 so 59 43 52 56 means that there are a total of 59 adenine here 59 of a here 43 of t i mean 52 of g guanine and 56 of c the cytosine okay so we're finished here now we're up to here okay and we're displaying the x which is essentially applying the dna custom function and the input sequence and then we assign it to x here and i believe that we didn't use the label or values here let's save it all right and so okay and so let's move on to the next block of code here number two print the text print the text okay so let's say that we want to print in human readable form because previously it looks like a dictionary right so not so friendly for the end user so let's say that we want to print out the text here we want to say there are 59 adenine there are 43 thymine there are 52 adenine oh no i mean guanine it's typo let me do it so i guess i copy and paste it okay so the abbreviation all right so the second part is finished now let's move on to the third part display the data frame all right so here we're displaying it as a data frame and so essentially there are two columns the nucleotide and the count and so this block of code here we're going to create a data frame from the dictionary function and then we're going to rename the column because at default it will become zero here let me show you [Music] it will become zero so we're gonna relabel zero to become count okay okay and so as you can see this block of code here will give you this data frame and this data frame will be used for creating the following bar charts in number four display the bar chart and so for the fourth one here we're going to be making the bar chart and we're going to make use of the altair library so line number 87 will print the subheader lines number 88 to 91 it will create the actual plot and then in lines number 92 through 94 we're going to adjust the width of the bars because by default the bars will be pretty thin let me show you right here yeah so it's pretty thin so we're gonna adjust it to be 80 which should look a bit better all right so it looks a bit better and notice here that the chart are assigned to the p variable and then in order to show the plot or the chart we're going to put it inside as the input argument to the st.right function so that it becomes a streamlit object okay and so there you have it a very simple bioinformatics web application feel free to modify this to be another web application in bioinformatics or for any industry as well because the code is quite applicable and you could use it as a template for building your own personal data science project okay so this video is the fifth part of the streamlit tutorial series where i go into detail step by step on how you could build a data science web application and so let's have a quick recap in the first part i have shown you the first web app that you could build using streamlit in python and with that application you're obtaining data dynamically from the y finance library where it will retrieve stock data directly from the yahoo finance and then we're going to make a simple line chart in the second part we have built a simple iris predictor where we employed machine learning algorithm and in the third part we used the machine learning algorithm to classify penguins into one of three species and so the concept is similar to the iris predictor but with a notable difference in that the input parameters have three additional ordinal or qualitative variables so that required a little bit more data pre-processing in order to encode the ordinal features into binary form and in the fourth part i've shown you how you could deploy your data science web application onto the cloud so that other people and your friends can have access to your data science web application and we did that by deploying the application onto hiroku and so today it is the fifth part of the streamlet tutorial series and we're going to combine two prior videos in the development of a web application today so essentially we're going to dynamically retrieve data from the internet by doing web scraping of the basketballreference.com website and so after we have web scraped the data we're going to do some data filtering and so all of this will be done right inside streamlit and so finally we're going to perform a simple exploratory data analysis by creating a simple heat map and so all of this in less than 70 lines of code and so without further ado we're starting right now okay so the code that we're going to use today is called basketball underscore app dot py and upon opening the code we're going to see this so all of this code is less than 70 lines of code and so this is also including empty spaces as well so if we are to delete all of the empty spaces i would think it is approximately 60 lines of code and so a lot to be done here and it will occupy just a few lines of code here so the data science web application that we are building today is called the nba player stats explorer and so before we take a deep dive into the code let's try to run this code and let's have a look at the web application so let's close the file for a moment and let's open up a command prompt so if you are on a windows type in cmd if you are on a mac or a ubuntu you want to open up your terminal and so this is only going to work on my computer because i'm going to type in conda activate dp dp being the name of the conduct environment that is installed on my computer so if you have a conda environment installed on your computer you can activate that particular conda environment so you could type in conda activate and then the name of your environment for example if the name of your environment is my env then you would type in conda activate my env but because on my computer it's called dp i'm going to use conda activate dp and so the lengthy explanation is due to the prior videos i have been noticing that there have been some misunderstanding that you will have to type in conda activate dp which is not the case because if your computer is using just normal python you could just proceed with following the tutorial here however if you're using conda then you will want to activate your own environment okay so let's continue so now i have activated my conda environment i'm going to change directory to the desktop and then to the streamlit folder which is where all of my data are residing in so for your case you want to change the directory to where all of your files are located particularly for the streamlid tutorial so the links to this code will be provided in the description down below so you want to check that out okay here it is so the name of the app is basketball underscore app dot py so a point in note for the first time that we're running this basketball application you will see some error message and the error message will tell us that some libraries are missing and so that we will have to install the prerequisite library in python so let's have a look at this together so that we could overcome this error together okay so you want to type in streamlid run basketball underscore app dot py and so we're gonna see this pop up browser and so here it says that xml not found please install it so let's close this and then we are going to install lxml contact tp pip install lxm lxml okay so it's installed let's head over back so notice that i could type fast here because i type in the first few characters of the folder name and then i hit on the tab button and so the tab button will auto complete the folder name for example i type in streamlit run ba or just b and then tab it will autocomplete the name for me let's run the application again okay so it seems to work now and so this is our nba player stats explorer web application that we are going to build today so let's have a look at the general characteristic of this web app so you're going to see that on the sidebar on the left we're going to have three input parameters so the first one is the year of the data that you want to have a look at and then the second parameter will be the team and so notice here that you could select multiple teams and by default it will select all of the teams for you and then you could take on the teams that you don't want and then the results will be updated on the fly so notice that whenever i click on the x mark here you will see that the number of rows will reduce right from 683 to 665. and so the third parameter of the input is the position of the players so we're going to have the five traditional positions here center power forward small forward point guard and shooting guard and so to the right here which is the main panel so here we have the name of the web app and then some description and then we're gonna have some header here followed by the description of the data dimension and then we're gonna display the data frame of our data set so this data set will be downloaded spontaneously from basketballreference.com and so we're gonna use pandas to do the web scraping and then it will do some simple data filtering so i'm going to show you in the code in just a moment and so once we have done some simple data filtering we're going to display the data accordingly and the data frame that you see here we will be able to export the data as a csv file and finally we will also have a look at the intercorrelation heat map of the input parameters and so this is the exploratory data analysis that i have mentioned so we're going to do only one plot here so please feel free if you want to explore by including other data visualization plots in your web app so in this web application i made it so that the heat map will be hidden unless you click on the button and upon clicking on the button you will see the heat map coming up okay so pretty cool right so let's now have a look in the code shall we let's open up the atom editor file open file and then basketball app all right so let's have a look at the code side by side okay here so in the first six lines of the code we're going to import the necessary libraries so we're gonna import the streamlit library because we're using streamlit to build the web app we're gonna import pandas because we use pandas to handle the data frame and also to perform the web scraping we use the base64 library in order to handle the data download for the csv file because it's going to encode the ascii to byte conversion and so finally we're using matplotlib seaborn and numpy in order to create the heat map plot okay so let's have a look further on line number eight st.title is the name of the web app and the name of the web app is nba player stats explorer which is right here on the right and then some description of the web app is provided here so you will notice that we're using markdown language so this description here will correspond to this line the two bullet point will be corresponding to the two asterisks here and the both texts will be corresponding to the two asterisks here and so we're also including the data source link which is basketballreference.com which is where we are downloading the data and so the following code here are for the sidebar so this is the header name of the sidebar so we're using the st.sidebar.header and then in the input argument we're using user input features which is right here user input features and then in the following line on line number 17 selected year is the name of the variable and then we're going to use this line of code here to display the years that we want to see the data for so we're making it as a drop down menu so we're using the select box function and so as the import argument the first argument that you see here is name year and so year is right here so if we change this a bit save it and then we say always rerun and then you're gonna see that the question mark appears as well so if we take it out save it have a look back then the question mark disappears okay and then the following input argument here is we're going to create a range of numbers so we're going to have 1950 until 2019 which is right here and then we are going to reverse the list of the numbers because it's going to start from 1950 in 1951 1952 and then at the bottom it will be 2019 and so i use the reversed and then convert it into a list in order for it to display 2019 at the top right otherwise it will display 1950 at the top and then it will be the default value and so notice that whenever you change the year here the corresponding data frame will also change so all of these data are downloaded on the fly on demand upon clicking on the input parameters so nothing is stored locally on the server side okay and then in this block of code here it will be performing the web scraping and also some data pre-processing and this is taken from a previous tutorial video where i will also provide the link in the description of this video as well where we perform a simple web scraping of the basketballreference.com website and then the only thing that we're using for the web scripting is only one line which is pd.read.html so it's going to read the html file particularly the data is in the form of a table therefore pandas can easily read the data from the table which is inside the html file and then we're going to use it by dropping some of the redundant header which is present throughout the table data and then after removing those we will perform some simple deletion of some index column here called rk because it will be redundant with the index provided normally by pandas and then finally we will display the pre-processed data and then on line number 29 we're going to make use of this custom function load data and then the selected year is the input argument which is the variable here selected year so the custom function load data will retrieve the nba player stash data by using the input argument which is provided in the selected year variable and the selected year variable is accepting input from the drop down menu here so whenever you say 2019 the value of 2019 will go into the load data 2019 which is the input argument so we're going to retrieve the data for 2019 and etc for other years that are selected okay so lines number 31-3233 we're going to allow the user to select the teams that we are going to display in the data frame and so line number 32 is the variable called sorted unique team and so here we're going to use the player stats data frame particularly the team column and then we're only going to display the unique values okay and then after that we're going to sort the team alphabetically and then it will be displayed here right starting from a until w here and then on lines number 35 36 37 we're also going to allow the user to select the positions and so here we have specified the five traditional positions okay so let's have a look at the import argument of the multi-select function so this multi-select function will allow us to display all of the possible values inside the team variable and the position variable and the second and third input argument here will be telling the multi-select what are the possible values that we are going to be using so the second input argument here will be using all of the team names and all of the position names as provided on lines number 36 and 32 and then the third input argument for both of them we're going to also use as default value which we will be displaying here to be the same thing so essentially we're keeping the second and third argument to be the same so this means that we're going to display all of the possible values in here so let me show you what if i do a simple slicing of only the first index number so i will be showing you only one position which is the first position right here center so if i do the same thing here i'm going to show only the first team which is atlanta right so here you're gonna see that the options are showing only one value so if i change this to two or three just some arbitrary number and so you're gonna see that these arbitrary numbers that i have provided will show you the input values as the default so if i just say everything we're gonna use everything so therefore the second and third will be the same okay so let's save it okay so here we have all of the possible values so let's say if we change the year to 2018 you're going to notice that it's going to be loading the data for the first time and so it will say running load data function however if we switch back to 2019 where the data has already been loaded and so it will use the catch data and so if i switch back to 2018 the data has already been loaded just a few moments ago and so the data will be instantaneously displayed however if i select 2017 where the data has never been loaded before it will load for the first time and you're gonna see running load data function and then it will catch the data okay right here in the line 20 we're using the add sign st.catch before the load data function and therefore whenever we select a new year it will run for the first time and then it will make a catch of that data so that the second time we make use of it it will be performing much quicker and all of the underlying data will be updated as well filtering data okay here let's have a look on line number 40. so line number 40 will be filtering data based on the input selection in the sidebar menu so if we select dallas out denver out chicago out cleveland out then these resulting input will be dictating which rows to be shown and so you're going to notice that the data dimension will be updated to be 512 rows now and so if we take away some of it then the number of rows will be impacted here and so all of this is possible because of this filtering data part which is line number 40. so whenever we make a selection on the sidebar it will be in the selected team variable right here and also in the selected position variable right here which is corresponding to team here and position here so whenever we update the selection the data frame will be updated because the selected team variable and the selected position variable will be updated and so we're also going to only filter the data and show only the remaining input selection okay so all of this is possible by this line so i might make another video about selecting data in pandas so you could do this by using the name of the data frame and then you use a bracket and then inside the bracket you put in your condition okay so this is very powerful and i believe that for those of you working with data wrangling or data cleaning then this filtering capability here will be immensely useful for your project particularly when you are preparing your own data set all right lines number 42 43 and 44 will be displaying the header here and also the description of the data dimension and it will be showing the data frame itself right here so it will be showing this df selected team data frame the one that we have done filtering on okay and then lines number 48 through 52. it is a custom function called file download so the resulting data frame that we have here we're going to put it inside the file download as the input argument and therefore it will create this link for us download csv file so notice at the bottom that you're going to see a long generated code here and so this is made possible by the base 64 library and so it's going to perform some encoding and decoding okay and so all of this will be inside this function and so this function is brought to you by the wonderful discussion in this streamlit how to okay and so let's have a look at the heat map so lines number 56 until 68 is the heat map right here where we click on the button and then it shows the heat map it is possible by this if statement if st button and then we have the name of the button inside as argument and then we have all of these lines of code as the underlying code so st.heather is the header here so it's right here the header and then we're going to have df selected team saving it into a csv file and then we're gonna read it back in and so the reason is why are we doing that because i have been trying to create the heat map using the df selected team and it's not working and upon exporting it out as a file and then reading it back in it read perfectly so i think it has something to do with the data type that is inside the pandas data frame but upon reading it back in there were no issues okay and so then we performed some intercorrelation matrix calculation and then we created the heat map and using this line of code here we created this heat map where half of it are not shown because they are masked do you like football do you like data science if you answered yes to both then this video is for you and today we're going to talk about how you could merge football and data science together and without further ado we're starting right now so in this video we're going to cover how you could build your very own simple web application for exploring the nfl player stats data okay so let's get started so the first thing you want to do is fire up your google chrome or your internet browser and then you want to click on the seasons tab go to 2019 nfl click on the team stats and standings and then you want to click on the player stats and then for the standard you want to click on rushing and so please note that the data that we're going to be using today will be based on the rushing data and if you would like to use other then please feel free to play around with it and so you want to click on the rushing okay and so we're going to be taken to this page here and so you're going to see the player stats right here and so we're going to be scraping the data from this website and so let us copy this url all right and so now we're going to fire up the terminal and as always i'll be activating my conda environment so on your own computer if you have conduct installed and you have a contact environment then you could activate your own content environment but if you don't have any then you don't have to do anything so i like to use conda because it allows me to contain all of the libraries and packages and dependency in a self-contained manner so that it doesn't ruin the other project that i'm working on on the same computer so i'm going to desktop here streamlight folder going to the football all right and so that's the file that we have football app dot py okay and so we're gonna take a look at the code in atom and let me also fire up the web app as well okay and so this is the web application that you're seeing here and on the left you're going to see a collapsible side panel and so the side panel here will have a total of three input section so the first one will be taking the user input of which year you want the data to be from so here we're using a default of the year 2019 and if we click down on it you will notice that it will start from 1990 and so we could change that year and i'll be showing you that in just a moment and the teams will be detected directly from the data frame here and then the position is taken from the pos column here okay and so notice that this web app that i'm going to be demonstrating today we have not yet done any data cleaning and so we're using only the completed data so you're going to see that there are a total of 117 rows here so i'll be leaving it to you as a hobby project for you to clean the data and let's see how bigger the data set will become so let me show you what does the raw data looks like okay so that's the raw data okay and as you can see there are a total of about 344 rows here versus 117 which is the complete data okay so it's clean here the one that we have so let me hide this again okay and let's take a look further at the functionality of the web app that we have here so if you click on the intercorrelation heat map you're going to be seeing the intercorrelation of the variables here okay so let's take a line by line explanation of the code here so the first six lines of code will be importing the necessary libraries that we are going to be using today and so the first one is the streamlit because it allows us to build this essentially this web app and then we're going to import pandas as pd because of the data frame that we're using here we're going to be using base64 because of here the functionality to download the data as a csv file so it will be essentially encoding it and decoding it and so here we're going to be making use of matplotlib dot pi plot as plt and also the import seaborn as sns so we're going to use both of them together to make the histogram plot that we see right here and so here as you can see we're using numpy in the creation of this histogram plot all right and so now let's move on to line number eight line number eight here is the title of the web app and the title is nfl football stats rushing explorer lines number 10 through 14 is right here the explanation of the web app this app performed simple web scraping of nfl football player stats data focusing on rushing and so the python libraries that we're using is included right here and actually is not yet complete we also have numpy we also have matplotlib and seaborn and so the data source is coming from theprofootballreference.com and so notice that a couple of minutes ago i've shown you how i've copied this url link okay this link at the top here all right and then i've pasted right here as a reference so in the load data function notice that for the url right here this is the url that we are going to web script the data and notice that here we're setting the date in the range of 1990 to 2020 and so notice that in the url you have the first component here which is essentially right here the first segment and then the second segment will be the year which we are going to do programmatically by replacing it with a string of the year and then we're going to add forward slash rushing.htm right here with the forward slash and so essentially this will give us this url right here if 2019 is selected here but if we select 2018 then this becomes 2018 and the data will then be updated and here we set the header to be one in order to retrieve the data set here so this is the function for doing the web scraping using the pandas library and so as you can see that the web scripting is done in only one line of code here and then the other lines we're going to essentially be pre-processing the data set meaning that we're dropping some redundant headers or we're also dropping some repetitive columns and values etc and then finally we're going to be assigning the load data to the player stats variable and then we're going to be sorting the team and you're going to see the sorted team right here in the user input features here okay which is line number 33 and 34. so after we sorted the team and we're showing the unique values because here you're going to see that in the raw data the team values will be repetitive and so here we're sorting according to the unique values and so we're going to see only unique values of the team names here and in sorted order all right and then lines number 36 through 38 is going to be right here the position so here we have said it to be rb qb wr fb and te okay and we're also using it as a multi-select meaning that you could select multiple values at the same time or you could just delete it and then select the ones that you like okay okay you could add one by one the values that you like okay and then aligns number 40 and 41 here we're going to be filtering the data based on the input of the sidebar of the team selection and the position selection so line number 41 will be essentially filtering the data frame that we're seeing right here based on our input selection the team and the position lines number 43 through 45 we're going to be displaying the header called display player stats of selected team right here and also the data dimension in a normal text underneath the header and then the actual data frame will be displayed according to line number 45 right here okay and then lines 47 through 55 it will allow us to download this data frame into a csv file so let's download it and as a recall we're using the base64 library in order to perform the encoding decoding of the data and so you're going to see that the data is downloaded into this file and then the remaining part of the code will essentially allow us to make the heat map shown right here if you click on the button enter correlation heat map you click on it and then you will be seeing this heat map here of the intercorrelation between the variables and so this block of code will allow you to make the heat map and so as you can see here all of this is just under 70 lines of code and it allows you to build a very simple data driven web application for retrieving or web scraping the nfl football player stats data in this video i'm going to show you how you could build a data driven web application in python for web scraping s p 500 stock prices and without further ado we're starting right now okay so the first thing that you want to do is fire up your terminal so today i'm going to use a microsoft windows and then i'm going to as always activate my conda environment and the files are on the desktop and so i'm going to provide you the links to the files described in this tutorial so you want to check out the video description sp 500 okay so you're going to see that the only file that we're going to use is the sp 500-app.py so let's have a look at that and before doing that let's also open up the web app streamlet run sp 500 app.py all right so here we are okay so you're going to see that in the web app we're going to have a sidebar on the left and here we're going to have two input features the first one will allow us to select the sectors and the second one will allow us to select the number of companies and the data for the names of the company in the s p 500 was taken from wikipedia and so i'm going to show you where we got that the links is right here okay so the table that we're going to web scrape is this table here so we're going to see that it has the symbol for the company the name of the company and other information like the sector and the sub industry sector the headquarter location and also the founding dates of the company and notice that there will be two tables and the second table will be here but for the purpose of this tutorial we're going to use only this table here so before we take a dive into the web application let's have a look at how did i created the essential functions for making this web app so let me first go to google collab so let me expand this a bit okay so here is the details of how we're going to make the web application so firstly we're going to create a function a custom function that will allow us to web script the data from wikipedia so here we're defining a function called load data and then the first thing is we're going to put in the url of the s p 500 which is essentially right here and then we know that we're going to take this table so the prerequisite of web scraping using pandas read html function is that the data must be in a table so if data is in a paragraph like this it wouldn't work so you would need something like beautiful soup and selenium in order to do that and because the data is organized in a very simple table manner here we're going to just specify that we want the first table so we're going to specify this to be zero okay and then the data will go into the df which we will return here and upon running this cell let's do it so you're going to see that the web script data is provided in the df variable here and it looks very nice let's take a look at the sectors here in the fourth column sector so we're gonna select the sector column from the df data frame and then we're going to take out the unique count how many unique sectors are there and so we're going to see that there are a total of 11 sectors comprising of industrials healthcare information technology communication services consumer discretionary utilities financials materials real estate consumer staples energy okay and then so let's say that we're going to aggregate the data by grouping them according to the specific sector name which is comprised of 11 different sectors and then we're going to show only the first company for each of the sector of the 11 sector so we're going to see that for each of the 11 sector in the first index column here we're going to see the first example of the company name so this represents the first company for each of the 11 sectors and let's do a descriptive statistics of this sector variable so we're going to see that there are a total of 26 companies in the communication services 60 company in the consumer discretionary 32 company and consumer staples let's take a look at only the health care so let's take a look at only the health care sector so we're going to use the get group function and then here we're going to see 63 companies so all of the companies here are belonging to the healthcare sector okay so this first part are data derived directly from the wikipedia website and now let's do a second part where we take the names of the s p 500 and then we're going to retrieve the stock price data from the y finance library in python so the first thing is to install the y finance library okay and so now we have installed it and now we're going to import it as yf okay and so here these are the list of the company symbols from the df data frame that we have got above where we web script from wikipedia so here all of the company names are here and now we're going to retrieve the stock data so using the example code from this url directly from the y finance pipe project web page there is a function to download the stock price data and here we're going to use the period of ytd which is the year to date so this will be from the beginning of the year right until right now which is october 6 2020 and then the ticker will be all of the data in the s p 500 so there's going to be more than 500 companies here and then the data will be grouped by the ticker which is the symbol of the company and then everything else we're just going to leave it as default all right so let's run it so this should take a moment because it's going to download all of the stock price data from the y finance and it will be placed into the data data frame and note that the interval is one day which is also the default and you could feel free to change the interval to like one minute two minute but then this will generate a lot of data or you can make it in monthly intervals as well or even weekly intervals all right and so it apparently is finished let's have a look further so it's saying that two companies could not fetch the data from bf and brk and 505 have been completed out of the 507 okay so this is the stock price data and you can see that the price information are grouped according to the ticker symbol here and then you could easily retrieve specific companies by specifying it in the bracket okay and then you get the data that you need 192 rows because it is starting since january until right now this is october 192 days okay and so what we're going to do here is we're going to move the date to be one of the column as you can see that now the date is part of the column here and we want to make use of the closing price so our new data frame will contain two columns the date and also the close price so you could feel free to play around with the price data if you want to use the open or the high or the low price but for this tutorial we're only going to stick to the closing price all right and now we're going to do the fun part we're going to make a plot of it and so this is the plot of the closing price and in the x-axis is the dates and the closing price will be in the y-axis so you're gonna see that the price is dipped around april and then it kind of went up and if you compare from beginning of the year until right now the price is about 18 higher okay so let's do the same thing but then we're going to make a custom function and the reason for making a custom function is that it's going to make our life and the reason for making this custom function is that it's going to make our life much easier because we're only going to specify the name of the symbol that's the input argument like right here and then it's going to make the plot for you like here that we don't have to copy the entire code here then modify it to suit the name of the ticker symbol so we're going to make a general helper function let's do it here so the custom function that we're making is called the price plot and then we're gonna use price plot and then the input argument will be the ticker symbol we could change the company if you like all right and then you're to see that the price is it's updated now and then the beautiful part about this is that we can even run it as a for loop so the price plot will be inside the for loop and then we're going to iterate through all of the company names but then here we're going to select only the first 10 companies okay i have to run oh okay we updated the name here df okay we're using the same name then we have to take the name here we have to rerun the load data function okay and now it will be reassigned to the df because previously we have overwritten the df variable name and it is lacking the symbol column right here right here we have overwritten the same name so i think we should call this something else let's call it df2 and this will have to be df2 right let's do it again this is df2 and then okay it works again it should work now because we have already run the symbol here df symbol all right it's working and we want the first 10 companies all right 1 2 3 4 5 6 7 8 9 10. all right there we have it and so the proof of concept is now working and we have tried the proof of concept directly on the google code lab so the great part of making this proof of concept on the google code lab is that we could be quite flexible in working on different computers and using the same google code lab and once we're finished with that then we could start the production phase where we will deploy it to the web application in streamlit and so now we're ready to deploy it onto streamlit and let's do that and there you go we have already deployed it to streamlid a couple of minutes ago all right and so let's take a look at this web application so in the user input here we're going to have the name of the sectors where we could essentially select the sector here and then we also are able to select the number of companies that we want to show in the show plot area so if we have two companies then the show plot area will also have two buffs up to a maximum of five and the maximum of five could be changed to another value that you would like okay and so i'm using five just an arbitrary number and here we have five plots and the great thing about this is that you could also modify this web application so that it will be tailored to your own country's stock price so let me know in the comments how you're making use of this or how you are modifying this web application and if you would like to share your creation let me know and i could share it on one of the tutorial videos where i could show how to recreate the web application that you have made and so drop me an email and the email will be provided in the description of the video okay and so let's take a line by line look at the explanation of the code so here the first seven lines will be importing the necessary libraries so first one is the streamlit the second one is pandas and we're gonna use streamlit because this web application is built on top of the streamlit library and we're going to use the pandas because of the data frame by which we are showing the data and then the base 64 will allow us to encode the data so that we are able to provide it as a csv file and here we're going to use matplotlib for making the plots and apparently we didn't use cborn and so let me delete that and also from here all right and we also have y finance we're gonna add it here all right and so that left us with a total of six library and then the next one is numpy and why finance let's see did we use numpy no we did not use numpy sorry because i was using a previous web application as the template for this one so even less now we're having only a total of five libraries so why finance the fifth one is going to allow us to retrieve the stock price for the s p 500 and so you're going to see here that we are making use of only five python libraries all right and so the st.title will be the function that we're going to use to make the title of the web application here s p 500 app shown as a both text here and then we're going to make use of the markdown function in order to display the details of the web application and also the libraries that we're using in python and also the data source which is taken from wikipedia and then we have the st.sidebar dot header it means that we're going to create a header called user input features and then we're going to put it inside the sidebar to the left here on line number 19 we're going to say that we want to catch the data if it has already been run for the first time so that the second time or subsequent time it wouldn't need to redownload the data again and again and so it only needs to do that only once for the first time so lines number 20 to 24 is a custom function that we have taken directly from the google code lab where we have experimented with the creation of this web scraper of the s p 500 data and so we just paste it right here and then we're going to assign the web script data into the df data frame and then we're going to group by the sector name which is right here we're going to group by the it's not shown here okay right here we're going to group by the sector names on line number 27 and then lines number 30 and 31 we're going to display the sector name as an option where you could select you want to select only particular one here we are only selecting communication services and so we're going to see that there are 26 companies here or we could also add one by one the companies that we like i mean the sectors of the company that we like all right and then line number 34 is only going to filter out from the entire data frame the sectors that you have selected in the user input feature panel here so it's going to use the is in function in order to see which are coming from the selected sectors right here three selected sector which is right here selective sector coming from the sidebar and multi select function multi select means here we have multi selection and now the data will be in the df selected sector and then we're going to write it out here in the st data frame and the input argument is df selected sector which is coming from here and it's filtering all data belonging to the three selected sectors here or it could be four right here okay and in lines number 37 it's going to write out the dimension right here number of rows and number of columns and the st.header will be the function to display the header here all right and lines number 42 until 46 is going to be the custom function to allow us to decode encode the data and make the csv file of this data available for download right here and then you get access to the csv data all right and then the second part is the finance data that i have shown you just a moment ago on the google code lab and so here we're going to make use of the constant function that we have already done just a moment ago and then for this example we're just going to limit it to about 10 companies otherwise it might have taken longer to create the web application okay and so let's have a look further and then the custom function to make the plot was also taken from the google co-lab and so lines number 52 until 61 will download the data of the stock price directly from y finance lines number 63 until 73 is the custom function to create the plot that we're going to see here in the show plot okay in line number 75 it is the slider for you to select the number of companies so if you want to make it into 10 then you could do so as well multiply that to be 10 and then the number here will then be 10 and now you could have 10 plus okay you could have 10 plots now and make sure to play around with this number as well make sure that this number is greater than this number okay because it will be the total number of stock price data that you're going to have let me change it back to five and let's say that you don't want to hide the plots here you just want to show it you could hide that function the if let's see it's indented okay let's see again all right and now you have the plots immediately if this is 10 then you're going to get the 10 plus right here okay do you like cryptocurrency do you like data science if you answered yes to both questions then this video is for you because today i'm going to show you how you could build a cryptocurrency price web application and without further ado we're starting right now okay so the first thing that you want to do is fire up your terminal and then for me i have to activate my conda environment and you could do the same for your own environment as well and i'm going to the desktop because that's where i keep my files for the streamlit tutorials and i'm having the data in the crypto price folder all right and so in here you'll notice that we'll have two files the crypto app itself and the logo so let's open up the web application so i have to run streamlit run crypto price app dot py all right and here you go this is the web application that we are going to be building today and this is the cryptocurrency logo that i have drawn and it is the logo.jpg file so to the left here is the input parameters in the side panel so here you can select the currency for the price at default it will be using the usd and you have a selection of three choices usd btc or eth and then here we'll be listing the 100 top cryptocurrency okay and then here it will ask for the option to select how many top cryptocurrency should be displayed in the data frame table here and also in the chart here another option that you get to select is the percent change for the time frame the percent change meaning the price change that has occurred within the last seven days and you have a selection of three choices within seven days within 24 hours or within one hour and then the last option here that you could select is do you want to sort the values in here in the chart so you see here that the green color will represent the price change that is changed for the positive gain while some of the cryptocurrency will have a negative pricing here meaning that when it's compared between the first day and the seventh day when the seven days selected if the price change is negative it means that the price has reduced however if the price has increased then there is a gain okay so it's essentially the gain of the price in green or the loss in red okay so this is the cryptocurrency web app that we are going to be building today and you can notice that here the interface and the layout of the web application is full screen now because before the web application will be a bit centered at the middle and so we're going to be using the estate of the whole entire width of the monitor here and then the other exciting update from streamlit is that aside from being able to use the entire width of the screen we are also able to divide our contents into multiple columns so here we have a total of three which is the side panel so we're going to consider the side panel as the first column but also it should be noted that it could also be collapsed as well and then you're also going to have the entire screen for the content of the web application however for this web application we are going to consider the sidebar as of column one and then we're also going to consider this part here the data frame as column two and then we're going to consider the bar chart here as column three okay so before we would have only a sidebar and the main content so essentially we would be having like two columns however in the recent update that streamlit has made we are able to have multiple columns so it could be more than three we could make it four or we could make it five and so the multiple column will allow us to partition the content in a very aesthetic way meaning that we could group contents that are related in a particular column and so that would be more visually appealing okay so let's proceed with interpreting the meaning of the code line by line so let me open up the atom okay so i'm working on an automl web application right now so that's going to be for the future video right here crypto price okay so let me make it half the screen okay so you'll be noticing here that the first 11 lines will be the libraries that we are going to be using today and so you will be noticing that we have more than 10 lines in this tutorial in other tutorials we'll have like about six or five lines meaning like five libraries to use but in this tutorial we'll be using a lot of libraries here so the first one is the streamlit which is the basis for this web application so it's the web framework that we are going to be using and then we're going to be using the logo so we're going to use the from pil library and then we're going to import the image function and then we're going to make use of the pandas in order to show the data frame and then import base 64 because we need to encode decode the data in order to allow the user to download the cryptocurrency data here as a csv file because we're making the plots here we're needing the matplotlib all right so actually it's not 11 but only nine libraries okay so probably taken it from the previous project so i've just deleted the numpy and the seaborn so here we're using only the matplotlib for the bar chart and numpy was not used in this project and here we're going to make use of the beautiful soup in order to web scrape the data from the coin market cap website and the basis for the web scripting was taken from the article from brian feng and the article is entitled web scraping crypto prices with python and it is on the medium platform so you could click on this link in order to read the full article by brian so i have adapted his code in the tutorial in order to web scrape the cryptocurrency price here and make it into a web application for our tutorial and so the process of web scraping will require the request library the json and also the time here as well all right so we're finished with importing the libraries and then it should be noted that in order to make use of the new feature particularly the the page width that i have mentioned and also the extra column that is a new feature you need to upgrade your streamlit if you already have it installed on your computer however if you haven't yet installed it then you could install a fresh version but in order to upgrade it you need to type in pip install dash dash upgrade and then streamlit okay and because it is a new feature streamlit has probably used the term beta in front of the option here set page config and then the layout will be equal to wide so this will allow us to expand the content to the full width of the page so let's try commenting it out and see what happens all right here you go you see that when we comment out the page width it will be centered so you see that there will be white space to the left and to the right and if we use the new feature it will be expanding to the entire paste width here so we have some extra real estate for the plots and for the data frame and considering that we have over 100 cryptocurrency here it's always nice to have some extra room to show the plot and the data frame so we're going to make use of this new feature here all right and lines number 21 until 29 is going to be the logo here which is line number 21 and 23 where we import the logo.jpg file and we're using a width of 500 so the width of the image will be 500 pixel and then the title here is crypto price app right here crypto price app in the st.title and then the description shown here is in the st.markdown function right here and then another new feature here is the about section which is hidden inside the plus symbol here if we click on it it will expand and so this is part of the expander function so i'm calling it the expander bar variable equals to st.beta expander and so as always this is a new feature so they include it by using the beta in front beta so when we're using the expander bar we put it here and then followed by the markdown function and then we include the following bullet points here in the markdown syntax all right and then lines number 43 and 44 we are essentially setting the page layout of the web app so we're gonna create a new variable called column one equal to st dot sidebar which is the sidebar here and then we're gonna create two variables at the same time column two and column three column 2 is right here the data frame and column 3 is right here the bar plot so notice that we have the values of 2 1 inside here it means that we want to have the data frame column to be two times greater than the bar plot column meaning that the width of the second column here will be two times greater than the third column okay so you can see here that the width of this column is twice bigger than the third column which is right here if it's one to one it means that both columns will have equal size okay they're equal and if it's one and then three it means that you want to guess you are correct and so the bar plug will have three times wider width than the first column okay so we're leaving it at two to one all right and let's hop on to the next section here so i'm commenting this part to be side panel or the sidebar plus the main panel so column one dot header is the input option so this is column one meaning the sidebar and so input options is this one the header here and then currency price unit column one dot select box select currency for price it's right here so this select box will allow the user to select which cryptocurrency price unit to use would it be usd would it be bcc or would it be ath and upon selecting the options i mean the data frame will be updating the price unit from usd right here the price unit from usd to bcc or to eth and so the currency price unit will be used later on in the code okay i'm going to show you later on where it is used so this is the ui or the user interface part all right and so here comes the fun part lines will catch the following data meaning that the second time around where you reload the web page it will not re-web scrape the data so it's going to perform the web scraping only once the first time and it's going to keep a catch of that for the subsequent use so that would be beneficial for you when you're trying to improve this web application incrementally without even have to re-perform the web scraping over and over again so that will save a lot of resources so as i mentioned earlier on in this video the web scraping of the coin market cap data was performed according to the article by brian feng and you could read the full detail in his article on medium and so i have adapted that into this customized function load data so some of the data that we're going to web script from the coin market cap so let's have a look at the coin market cap here oops okay i have to change the coin market cap okay let me copy the link and i'll put it here okay so this is the website of the coin market cap so it's displaying the price of the various cryptocurrency the first top hundred and so this is essentially what we are going to web scrape today the top hundred price here so we're gonna get the data of the the name of the cryptocurrency the symbol bitcoin the name of it right and then the symbol of it is btc the price the 24 hour price change the seven day price change and also the one hour price change the market cap value and also the volume so we're gonna web script that today here and all of the data will be placed inside the df variable okay lines number 99 and and 100 let's have a look here line number 99 and 100 will be multi selection of the web app okay it's right here so line number 99 and 100 will give us this multiple selection box here so the top 100 will be shown here and so you could select which cryptocurrency you want to be shown here let's say we delete everything and then we're selecting btc eth right what else bnb right and so we selected only three so far and three are shown here okay so let's use the default one of 100 all right let's continue so upon our selection of the coins here 102 will display only the selected coins so if we selected five only five will be shown in this data frame right here and then it will also be shown in the bar plot as well so line 102 will do the data filtering line number 104 let's see number of coins to display it's right here display the number of coins here so it's going to be a slider from 1 through 100 okay one through a hundred all right so if i have it to 55 it's going to be showing the first 55 coins see index 54 which is the 55th line and then the bar plot will be also 55 coins as well or cryptocurrency all right and so let's have a look at line number 106. so same thing it's going to do the data filtering of the selected coins that you selected line number 108 percent change time frame okay as i mentioned earlier on you could select the time frame that you want to be used so by default it's using the seven days time frame here for the percent change and you could change that to like 24 hours if you like or even one hour so the price change that happens within the last hour okay and then you're gonna see which cryptocurrency had a gain a positive gain or a significant loss five percent loss and this one is about the first one is about three percent gain within the last hour okay line number 114 sidebar sorting values okay so right here so you're going to see that the values are sorted from highest to lowest gain and also loss as well so if the sort value is yes it's going to sort the value if it's no then it's not sorting the value so you see here that the values are not sorted and the values are in the same order here so the first listed coin ranked number one is btc so it's going to be shown at the bottom here the second rank eth will be shown here usdt will be shown as the third one from the bottom xrp will be shown as the fourth one so it's going to be in the same order as this data frame here but a bit inverted okay the first round will be at the bottom second round will be the second from the bottom etc all right 117 line number 117 price data okay so this is the data frame right here data dimension so it's right here it's going to print out that that there are 100 rows and eight columns here eight columns column two data frame the f coin so this is df coins is the data frame here line number one two two one two eight it's going to be allowing us to download the data here as a csv file and then lines number 133 preparing the data for the bar plot so here are here in column three we're going to make the bar plot so the following lines of code or the blocks of code here will be preparing the data for making the bar plot so what it will essentially do let's have a look so here we're going to select some of the columns so the coin symbol and then we're gonna select the price change the percent change of one hour 24 hour and seven days and we're going to create a new data frame called df change all right and then we're going to set the index to coin symbol so coin symbol will be moved to the index of that data frame and then we're going to create some additional new columns here we're going to call it positive in front of the one hour positive in front of the 24 hour and positive of these seven days and we're gonna do a conditional here so if the percent change that happens in the last hour if it is positive meaning it has a value greater than zero we're going to assign it to the positive column for the given time frame of one hour and so for the time frame of 24 hour if the value is greater than zero it will be assigned a value of one for the positive percent change 24 hour and same thing for the seven days period if it has a value of greater than zero it will be assigned a value of one in the new column called positive percent change and if it has a value of one in the positive percent change here it means that it will have a green color however if it has a value of less than zero it will have a red color okay so that will be shown in the following lines 144 until 168. okay and so here you're going to see that for the creation of the bar plot we're going to make use of the conditional so we're going to have three blocks here if else if and else so the first we'll check whether it is seven days and the second block will check whether it is 24 hours and the third one will be one hour so right here did we select seven days or is it 24 hour or is it one hour and the code is right here let me find it for you time period right here percent time frame select box okay and the option is seven days 24 hour and one hour okay percent time frame which is right here percent time frame okay so it's doing a conditional check if it's seven days it's gonna create the seven days plot if 24 hour is selected it will create 24 hour version so here you see that it says seven days here and this is the plot for the seven days and if i choose the 24 hour then it will go to this second block here and it will create the 24 hour as you see here bar plot and if i selected the one hour and it will go to the third block here and here you notice that it creates the one hour bark plot okay and there you have it a cryptocurrency price web application okay so a couple of days ago i released a video on how you can develop a simple web application in python using the streamlit library and so in this video we're going to incorporate an additional feature which is machine learning capability into the web application and so without further ado let's get started so the web application that we're going to build today is called the simple iris flower prediction app and so the app will do as it says it will predict the iris flower type given the four input parameters okay so let's have a look what does the web application looks like so the first thing that you want to do is head over to your command prompt so if you're on a windows you want to click on the search icon type in cmd if you have a mac or a linux you want to open up your terminal and because i have everything in my environment i will activate it by conda activate dp then move to the desktop and then i'm going to run the web server by typing in streamlit run and then the name of the file enter and so the file to this web application will be included in the video description so you want to check that out in order to follow along with this tutorial so notice here that it's going to spawn up a website for you on your local computer and so the url here is localhost colon and then followed by the port number which is 8501 on my computer okay so this is the web application and the first line here is the header of the web app let's have a look what does this looks like let me expand it a bit okay so as you can see here the web application is comprised of two components the first component is the main panel here shown to the right and the second component is shown to the left which is the sidebar panel and so the sidebar panel here as it is called here user input parameters it will accept the input parameters comprising of the four features that we will be using to make the prediction and under the hood the prediction will be made using the random forest classifier and to the right here upon selecting your user input values for the four parameters here and so you will see that the default values are provided in the slider bar here so we can see that simple link has a default of 5.4 and if we modify the number here to the right in the table we're gonna see that number also modifies and updates as well same thing if i modify the other variables you will see that the numbers inside the table here will be updated and correspondingly the prediction probability and the prediction will also be updated as well and so the model will be applied to make the prediction based on your input parameters so if you change the input parameters the model will be used to make the prediction once again as shown here if i modify it then the predictions will be created okay pretty neat and so let's have a look at the corresponding code which makes all of this possible so let's head over back to the code and so let me give you a line by line breakdown of the functionality of the code here okay so i noticed that we didn't use numpy so we're going to delete that okay so now it's well under 50 lines of code 49 now all right now it's 49. so in the first four line we're going to import the library that we're going to use in this app and lines number 6 through 10 will be a simple markdown text telling us the name of the flower so let me open it side by side with the web application so the header here is right here okay and the text here this app predicts the ios flower type is right here and so i can make it bold save it and it doesn't change so i have to say always rerun so it will be updated automatically all right so you see that it's bold now and so st dot sidebar dot header on line number 12 will be the name of the header of the sidebar panel right here user input parameters and so notice that if i take out the sidebar it will move to the right to the main panel okay so having it in the sidebar panel will be possible if we use the dot sidebar okay so let's save it and it'll move back to the sidebar okay and lines number 14 through 24 will be a custom function used to accept all of the four input parameters from the sidebar and it will create a pandas data frame and the input parameters will be obtained from this sidebar as shown right here to the left hand side and the text here will represent the name shown here so actually i could modify the name as such i can make it capital and then add a space and save it and notice that the name also updates so why don't i do that for the other one as well otherwise it might be confusing whether it is a variable name or not so this could be any text that you want all right and so the name is updated here and so the first value here represents the minimum value and it is 4.3 and the second number here represents the maximum value which is 7.9 and the third value represents the current selected value or the default value which is 5.4 so here 4.3 7.9 and 5.4 and so if you want to change the default value to something else like 5.8 and then it would be changed to 5.8 okay let me change it back to 5.4 all right so here we're gonna use the custom function that we built above user input features and then we're going to assign it into the df variable and this will be on line number 26 okay and lines number 28 and 29 will be right here user input parameters so you can see that in just two lines of code we could have the section header name and the corresponding table below so it's just a simple print out of the data frame and lines number 31 will be essentially just loading in the iris dataset line number 32 will assign the iris.data into the x variable and the iris.data are essentially the four features comprising of the simple length simple width pedal length pedal width and this will be used as the x variable and it will later be used as the input argument and the y variable here will be using the iris.target and so they are the class index number of zero one two and line number 35 we will be creating a classifier variable comprising of the random forest classifier and line number 36 we're going to apply the classifier to build a training model using as input argument the x and the y data matrices okay and in line number 38 we'll make the prediction 39 will give you the prediction probability lines number 41 and 42 will be just a simple print out of the class label and their corresponding index number line number 44 45 will give you the prediction and so the prediction here is the class label of being either setosa versus color or virginica line number 48 and 49 will give you the prediction probability and so the prediction probability will tell you what is the probability of being in one of the three classes so here it is predicted to be irishitosa so what's the probability of it being a seltosa so the probability is a hundred percent so if we change the input parameters a bit and we're gonna see that now it is still predicted to be setosa but the probability of being setosa is slightly reduced to ninety percent and there is a seven percent chance of it being a iris diversity color and a three percent chance of it being a iris virginica okay and there you have it a simple iris flower prediction app so in the previous video i have shown you how you can use an alternative to the iris data set called the palmer penguins data set and so in this video i'm going to show you how you can develop a web application for the palmer penguins dataset and so without further ado we're starting right now so as briefly mentioned we're going to use this palmer penguins dataset that is provided by this r library called palmer penguins so for your convenience i'm going to provide that data set on the github of the data professor and so make note that this data set that we're going to use today was derived from this github library and so i'm going to show you where you could have access to the data that i have already exported out from the palmer penguins library package so it's right here in the data professor data repository find penguins cleaned so i have already cleaned the data set so you could also make use of this okay so we're gonna use this in this tutorial okay so let's head to the working directory so i'm gonna provide you the links to all of these code files that we're gonna use in this tutorial okay so before we begin let's have a quick recap and so in the first part of this tutorial series on streamlid i have shown you how you could use data directly from the wi-finance library and how we could display a simple line chart in part two i have shown you how you could build a simple classification prediction web app for the iris data set and in this part three we're going to use the palmer penguins data set in order to make a classification web application so let's have a look at this data set okay so it has already been cleaned and here there are a total of so here there are a total of seven columns so we have species island bill length build up flipper link body mass and sex and let's scroll down and so there are a total of three three four rolls so not including the first row which is the heading there are a total of three three three and so we could see that we have already deleted some of the missing values and so it should be noted that the missing values were deleted totally so that is the simplest approach that i'm using and you could feel free to do some imputation of the data set in order to retain more of the data so the missing value that were deleted could be less than 10. and so please feel free to provide the link to your github page where you have applied some unique imputation approach and perhaps i could also include it in the github of the data professor as well so all of you guys all of us can have access to your imputed and cleaned data set and so in the meantime we're going to use this data set that i have already cleaned and it is called penguins underscore cleaned.csv and so in part two where we built a simple iris classification web application you might notice that the code is building the prediction model every time that we load the file in and every time that we make adjustment to the input features it's going to rebuild the model over and over and so as some of you have pointed out this particular flaw of the code i totally agree with you and so the previous version was built like that for the simplicity of the tutorial and so in this tutorial we're going to use another approach where we could beforehand build a prediction model pick all the object which is to save it into a file and then within the streamlit code we're going to read in the saved file and so the advantage of that is that there is no need to rebuild the model every time that the input parameters are changed and so let's have a look at that okay so the code that we're using is called penguinsmodelbuilding.py and let's take a look at the code let's edit with adam all right so here so we're going to use pandas spd and then we're going to read in the csv of the penguinsclean.csv here which is also provided in the same directory as you can see here and then we're going to take this penguin's data frame take the data and put it into the df variable and then we're going to define the target and the encode variable according to the excellent kernel kaggle provided in this link from pratik and so kudos to pratik for the code that we're using as the basis of this tutorial and so here we're going to use ordinal feature encoding in order to encode the qualitative features such as species island and sex and so the objective of this tutorial we're going to predict the species of the penguin and so if you would like to predict the sex of the penguin you could replace species with sex and then you could put species in here okay so actually this was the exact parameters used by protic in his kaggle kernel but in this tutorial i have modified it a bit by using the species as the target where we're going to predict the species of the penguin and the sex and island will be used as the input parameters okay so this block of code here will be encoding the sex and island columns and in this block of code here it's going to encode the target species and in this line of code here we're going to apply this custom function in order to perform the encoding and so in this two lines of code we're going to separate the data set into x and y data matrices in order to use it for model building and scikit-learn right because here x will be the input features and y will be the species so in x we have six features and in y we have one feature and so here we're going to build a random forest model and from sklearn.ensemble we're going to import random forest classifier and we're going to assign the random force classifier to the clf variable and then we're going to use the fit function in order to build the model using x and y as the import argument and then finally here we're going to save the model using the pickle library and we're going to use the pickle.dump function and as input argument we're going to use the clf which is the model that we have already built and then we're going to open or we're going to create a file called penguins underscore clf dot pkl okay so let's close the file and run this code so we could do this right inside the command line so i'm opening up the command prompt heading over to desktop going into the streamlet folder going into the penguins folder and then i'm going to activate my environment and then we're going to run the code in model building python so i have to make sure to type in penguins dash and then the tab function okay there you have it all right so as you can see the file popped up here and the pickled file has been created successfully all right so we're going to copy this to the previous folder in here okay so let's have a look at the penguins app here okay so the first five lines we're going to import the necessary libraries so here we're going to import srimlet sst import pandas spd import numpy snp import pickle and from sklearn.ensemble import random forest classifier okay and in this block of code here is the title in markdown format and the corresponding description of this web application so why don't we have a look side by side so let's resize the window a bit cd desktop cd streamlet conda activate dp penguins okay streamlit run penguins dash app in order to run the application so it's popping up the window here okay so this is the finished application that we're gonna build today and as you can see if we change the input parameters the prediction here will be changed automatically so we could see right and we're also going to get the corresponding prediction probability as well and so it should be noted here that this web application was much more difficult than using the iris data set partly because of the issue with the two qualitative features that we're using so the thing is with iris data set if we're using that it's going to contain only the quantitative features so there won't be any ordinal features like sex or island and so under the hood we have to encode the ordinal features and for example for the island feature here we're going to create three additional columns called island bisco island dream island torgersen and for each of this three feature we're going to have binary value one or zero it has the island being visco if the island bisco is having a value of one and so if the island biscuit has a value of one for a particular penguin then it will also have corresponding value of zero for dream and corresponding value of zero for torgersen and so the same thing for sex we're going to create two additional features sex male sex female and for input feature here if the sex is male then the sex male feature will have a value of one and the sex female will have a value of zero so therefore we will create five additional features on top of the four features that we have that are quantitative so that will bring us to a total of nine features so that was a bit complicated but we have already solved the issue for you guys and the code is inside here so let's have a look so here we're using st.sidebar dot header user input feature and so that is the name of this sidebar heading and then we have st.sidebar markdown and so we're going to provide example of the csv file and the link to the csv file that is an example is called penguins example and so you can see that we're providing the link to this csv file in markdown format and so you could click on it and it will bring you to this data set so in order to download this you would have to right click and then save link as in order to do that so the reason for having the example csv input file here is because i have been receiving some comments whether i could include a feature where the user could upload their input file and so in this tutorial i'm going to show you that okay so here we're going to download the csv file and notice that it's providing the extension.txt and we have to make it csv save it all right so this is the input features that we're going to use and we're going to upload the file and there you go the uploaded file are used as the input features here and a prediction is being made here okay so it predicts this input as a delhi and with corresponding prediction probability okay so as you can see there are two possibilities for the input features so the first option is to import the file as csv format and the second possibility is to directly input the parameters by the slider bar and so under the hood the code will be using one or the other as the input for the predictions to be made and so for doing that we're going to use the if else conditional so let's have a look further so in line number 22 here it is using the st.sidebar file uploader and so it will give us this functionality to allow us to upload files and so the uploaded file will go into the variable called uploaded file and so we're going to make use of conditionals here and so there will be two scenarios we're gonna use if there is an uploaded file and the uploaded file is not empty not none meaning that if the uploaded file is not empty then we should create a input df variable and then we're going to read in the uploaded file or in else then we're going to run this block of code here and so the block of code here will essentially define a function that will accept the input parameters from the slide bar here okay so as you can see the conditional have two possibilities if there is an uploaded file create a data frame and read in the uploaded file else read in the input parameters directly from the slider bar okay and so notice that the contents of the input will be saved into the same input df variable so that will be easy for the following blocks of code so previously i've shown you now how you can import the library write the header of the web app shown here and along with the description and then the header of the sidebar along with the link to the example csv file here and then the upload functionality shown here and then in this block of conditional if else regarding the input features if there is a file to be uploaded or if not then we're going to use the slider bar as the input okay so now the fun part comes in reading in the data from the penguins underscore clean dot csv and then we're going to drop the species column because we're going to predict that column and therefore we're going to drop it here and now we're going to combine the input underscore df to the entire data set of penguins and so the encoding code that we're using it is expecting that there are multiple values inside the particular column that it wants to encode so for example in the island variable it is expecting that there are three possibilities right the three different island or in the sex column it is expecting that there are two male female okay and so the thing is the input feature that we're using here will be from one penguin sample and so let's say that one penguin sample could have island as biscuo and therefore it will only know of one possibility and so this block of code will not work so therefore we have to integrate this input features on top of the existing penguin dataset so before we might have 333 rows then it's three three three plus one so that makes it three three four and then we're going to perform the encoding and then there will be two possibilities for sex three possibility for island okay all right so now we're going to display the user input in the user input features so right here user input feature is this block of code and we're going to use conditional again and so the first possibility is if there is an uploaded file write out the content otherwise write out the content of the slider bar and then also put the text that we are awaiting the csv file to be uploaded so in this scenario it's telling us that there is no input file uploaded and if we upload the input file then this line of code will be disappearing okay so let's proceed further and so this is the classification model part so in the previous part two of the streamlight tutorial on iris dataset we've built the random forest model right inside the streamlit application but in this tutorial we're just reading in the saved file which i have shown you at the beginning of this video so the pickled object is called penguin clf dot pkl so we're going to read that in and we're going to assign it the load clf variable and then here on line 71 we're going to create a prediction variable and then we're going to assign the value of the predicted value and so we're using load underscore clf dot predict function and then input argument is df and df is corresponding to the input features either from the uploaded file or from the sidebar okay and so that is df and then on line 72 we're also going to use df as the input argument to the predict prabha function and that will provide you the prediction probability okay and so in the block of code of line 75 through 77 it's going to write out the predicted value of the penguin and so here it is predicted to be gen 2 and then on line 79 and 80 it's going to predict the probability okay and so the probability values are shown in this data frame here and so here we can see that it is predicted to be gen 2 and there is a prediction probability of 81 so this can be taken as the relative confidence that we have in this prediction so it's kind of like we're 81 confident that this prediction is correct however for other prediction the probability could be decreased to 67. so as you can see when we reduce the body mass the probability decreased but if we increase the body mass the probability increased three two one okay so welcome back to the data professor youtube channel my name is chanin ahmad and i'm an associate professor of bioinformatics and in this video i'm going to talk about how you can develop a web application powered by machine learning for the boston housing data set so without further ado we're starting right now okay so today we're going to create a simple web application that is powered by machine learning and we're going to use the boston housing data set as our example so in your data science learning journey you probably have came across the boston housing data set so let's take a quick look so the boston housing data set is comprised of 506 rows 14 columns so the 14 columns are provided here so these are the descriptions of the 14 columns and the target response that we're going to predict is called the median value of owner occupied homes in 1 000 units so if you would like some more details read these resources and so let's dive into the code okay so let me open up my adam text editor all right so let's have a look at the code so the code is 86 lines so i provided you some bonuses here so aside from making the prediction we're also going to provide you the understanding behind the prediction okay so these will be in about 10 lines of code and it's using the shaft library so actually the entire code for the web application for predicting the house price is 71 lines of code so actually we could make it fewer lines of code if we just deleted the extra spacing that i have made here so like for example if i just do it like this then i would save a lot of spaces but i'm going to use excessive lines in order to make the code look as simple as possible okay so let's get started so the first six lines of code as we see here are for importing the various libraries so the first line is we're gonna import streamlit as st line number two we're going to import pandas as pd so line number three we're going to import the shaft library and the shape library will provide you the understanding behind the prediction number four we're going to import matplotlib.pyplot as plt line number five we're going to import the dataset from sklearn in line number six we're going to import the random forest regressor from sklearn.ensemble okay so let me run this web application so that you could see it side by side with the code so let me click on the search and then type in cmd i'm going to go into my environment for running data science so i'm going to activate my environment called dp so for those of you you don't have dp as your environment so you can create your own environment and then when you have already created that you will activate it using conda activate dp okay so i might probably make a future video about how you can create environments in conda so maybe a brief teaser so the reason for using environment is that we don't want to mess up our computer so for example if we installed libraries that might influence the version number of other dependency library so that might ruin your other data science project so for example if you're using pandas 1.0 in one project and then in the other project you require panda 0.93 and so there will be incompatibility issues right so it's working for one version but then when you install another library which is depending on 0.93 then it's not going to work when you have pandas 1.0 so in order to counter that it would be best to run a different data science project in a different environment okay so different meaning that not every single project that you're running you have to create a new environment but you're going to create a new environment if you're using different libraries okay so like for example here in this tutorial i'm using predominantly sklearn and streamlit so whenever i do tutorials about sklearn and streamlit i would use this particular environment okay so let's continue so i have already activated the environment and i'm going to move into the desktop here and then i'm going to run the code streamlit run and then the code name is boston house all right so i just type in the first field character hit on the tab and then it's going to auto fail for you hit on enter and then we're going to load up the local version all right so this is the web application that we're developing okay so i'm not sure why feature important is not shown here try loading it again it's running all right feature importance is loaded so this is the web app let's have it side by side all right here so lines number eight through 13 right here is going to be corresponding to the title of the web page which is right here so line number nine we're using the hashtag which is a equivalence to the h1 heading in html and it is called boston house price prediction app and then in the normal text we're going to type in this app predicts the boston house price and then in markdown language we're using two asterisk which is equivalent to bolding the text okay if i make it only one asterisk it will be in italic see it'll be in italy okay so i'm just undoing it that all right and then in lines number 16 we're going to import the boston house data set and then we're going to split that into x and y data frames so the x data frame will contain only the independent variables or the x variables so sometimes you call these as the input features and then in the y variable here we're going to have the target and the target here is the median value of the house price all right and on line number 22 until line number 52 it is the side panel right here so line number 22 it is the header of the side panel specify input parameters so lines number 24 until 52 we're defining a custom function for accepting user input features so here we see that we have the 13 features and then the 13 features will accept input here so we're going to make use of the st.sidebar dot slider and then as the input argument we're going to have the name of the feature crim and then the next is the minimum value the maximum value and the mean value so why do we need to add these values here minimum maximum and the mean so the minimum value here is the value that you're going to see here at the lower limit of the slider and then the max value will be at the upper limit of the slider and the mean value will be the default value that we're going to put into the input parameters okay so the mean value of crim is 3.61 right but once we change this then the prediction will be modified so we're going to use the same logic for the remaining input features and then finally we're going to put all of the features here into a data frame and the data frame will be returned here and then we're going to make use of the custom function and then we're going to call it df we're going to assign the default input features here into the df data frame and then in lines number 59 it's going to be corresponding to here specified input parameters line number 60 will be printing out the df line number 61 will be printing out this long horizontal line as a divider and line number 64 until 67 we're going to build a random forest model we're going to train the model on line number 65. and then we're going to make the prediction on line number 67. so a point you note here is that a model will be built every time the input parameters are modified so in order to improve the code let me make it your homework try to create the model outside of streamlet webapp file that you're seeing here make a pickle file save it as dot pkl and then you want to load that pico file into this web app so that you don't have to rebuild the model every time okay so you just built the model once you read it in to the web app the pkl file and then you will apply the pkl file which is the model to make the prediction using the input parameters that you specify in the sidebar okay so think of that as your homework and let me know in the description how that went all right so line number 69 you're going to see the header here prediction of the median value and then you're gonna make the st right to print out the prediction value which is 20.084 and then finally in the remaining 10 lines of code we're going to print out the plots provided by the shaft library so line number 75 and 76 it's going to extract the shop values line number 78 is going to print out the header here feature importance line number 79 is going to print the header of the plot i mean the title of the plot line number 80 is going to make a feature importance plot and then finally we're going to see the typical feature importance plot that we normally see when we're using random forest so here you're not going to see how it is contributing to the overall prediction for example you're going to see that else that is important rm is important but then you're not going to see important in what aspect are they contributing positively or negatively to the predicted values so in the shop plot here you're going to see that l-stat is contributing relatively equal on the negative aspect and the positive aspect but for the feature importance plot using the shaft value you're going to see the relative distribution on whether it contributed to the negative or the positive side of the feature importance okay so something handy and useful to see in this video i'm going to show you how you can build a bioinformatics web application and without further ado we're starting right now so in prior episodes i have shown you how you could use the streamlet library in python to build simple web application ranging from a simple financial web application where you could check the stock price a simple web application where you could predict the boston housing price a penguin species prediction web application and so in today's episode we're going to talk about how you could build a simple bioinformatics web application and it's going to be based on the prior tutorial videos that are mentioned in this channel so the bioinformatics web application that we're going to be building today will be an extension of a tutorial series where i have shown you how you could build a molecular solubility prediction model using machine learning where particularly we are applying machine learning and python to the field of computational drug discovery and if you think of it in the grand scheme of things it is part of the bioinformatics research area and so this video will focus more on the aspect of actually building the web application and if you're interested in how to build the prediction model on the molecular solubility that we will be using today let me refer you to the prior tutorial videos on this channel and the links will be provided in the video description and also the pen comments of this video okay and so let's get started shall we so we're going to go to the streamlet directory and then it's going to be located in the solubility folder and so all of these are the actual files that we're going to be using today and so actually i think i have the prediction model on google colab let me download that okay it's right here and so this will be a concise version of the contents of a prior video where i have shown you how you could build machine learning models of the molecular solubility data set and so as i mentioned already i will provide the links to this video and so let's connect to this and let's have a look so let me clear all of the outputs and then it's already connected and so we're going to import pandas as pd we're going to be downloading the calculated descriptors directly from the github of the data professor and so this has already been computed as mentioned in the prior video and so these are the computed descriptors molag p mol weight number of rotatable bonds aromatic proportion so these four are the computed descriptor from the prior video that i have mentioned and the log s is the y variable that we will be computing or predicting so these four variables are the x variables okay and so we're going to separate this data frame into two sets of variables one is the x variable so we'll be dropping the log s column and then we're going to have the last column as the index number is indicating here minus one which is lock s okay so the x variables will be containing all of the columns except for log s and the y variable will be containing only the log s okay and so we're going to be building the linear regression model here and so we're going to import the linear model from sklearn and we're going to also compute the performance metric by importing mean squared error and the r2 score function from the sklearn.metric so let's run that and then we're going to run the linear regression and then we're going to assign this to model and then we're going to perform the model.fit using the x variables in order to predict the y and so we will be building a training model here x not defined okay so i haven't yet run this so let me do that first all right so let's do the model building okay so the model has already been built and then we're going to be performing the prediction all right prediction has been made and it is assigned to the y pred variable and the prediction is made using the model dot predict and then using x as the input argument and then we're going to print the model performance here and these four values are the regression coefficient values for each of the four input variables of the x compressing of molar p molecular weights number of rotatable bonds aromatic proportion so the regression coefficient will represent the magnitude that each of these four variables are contributing to the prediction of the y so the greater the magnitude the greater the influence of that particular variable and so the y-intercept is right here the mean squared error 1.01 the r-squared value 6.77 and then let's print out the model equation okay so this is the equation and then we're going to visualize the scatter plot of the actual versus the predicted okay so let's run the scatter plot here all right so this is the experimental versus the predicted log s value so as we have seen here from the r squared we're getting pretty good correlation between the experimental and the predicted lock s values and finally we're going to pick up the object here so essentially we're going to save the model into a file called solubilitymodel.pkl and so we're going to import this into the streamlit web application and so we have already run it and then we could simply click on this corner here and then click on the download file okay and it's essentially the same thing that we have right here but it should be noted that the version of scikit-learn on google colab and on the local version that i have installed will be different and it will be giving some warning values so it might be worthwhile to copy all of the code here into a file and then run it locally so that it's going to be using the same version of the scikit-learn okay so let's think of that as a homework for you guys and let's get started in building our web application now so let's head over to the streamlight folder installability and let's have a look at the contents here so the delani solubility here so these are the input descriptors comprising of the x and y variables that we have been using just a moment ago to build the prediction model so i'm going to provide you these files in the github that is dedicated to this tutorial and also the jupiter notebook that we have seen just a moment ago for building the prediction model of this solubility data set and so that jupiter notebook will be using this input file here but also in the code it is downloading directly from the github of the data professor so actually we don't actually need this as well so i can just delete that and then i'm just gonna provide you the juvenile notebook and okay so a total of three files will be used for this web application so the first one is the logo so this is the logo that we will be using for the web application so i have drawn this in the ipad using the good notes application so here it is called molecular solubility prediction app so given input molecule the machine learning model will be predicting the lock s value okay so let's fire up the atom editor let's have a look at the contents of solubility app dot py okay so let's see so there's a total of about 110 lines of code and so notice that i have also included several lines of code that are essentially the comments here and so these are just for ease of reading having a look at what each block of codes are doing all right so if deleting the comments it will be probably just under a hundred lines of code so let's take a look at the code here okay so the first block of code that we will be using are essentially importing libraries that are needed here and so we'll be importing the numpy and so numpy will be used for the descriptor calculation we'll be using pandas because we need to read in the data set and actually we will be using it to prepare the data frames as well in the generate code here the function that we use for computing the molecular descriptors so full detail as i have mentioned will be provided in a prior video that i have shown you in a step-by-step manner and the hero of this tutorial is the streamlet library so this is the basis of this web application and we'll be importing pickle in order to allow us to save the machine learning model and then importing it in to this web application and then we're going to be using the image function from pil in order to display the logo that i have shown you just a moment ago and here we'll be using the chem and descriptors function of the rd kits it would be for computing the molecular descriptors and the molecular descriptors are essentially allowing us to describe molecules in terms of their physical chemical properties and so for ease of usage we have already created the custom function for calculating the molecular descriptors and they are provided here from lines here 12 which are the comments until lines 57 and so we will be having two custom function aromatic proportion and the generate function and let's take a look at the web application so let me run the web application right now conda activate dp you don't need to do this if you have already installed all of the libraries up for your python directly in the command line but if you use conda then you want to activate the environment that is dedicated for running your data science projects so i highly recommend installing conda and then creating a specialized environment for your data science projects so that will help us to maintain the library dependencies otherwise if you have several projects on your computer and then when you upgrade one library it might downgrade other libraries or other dependencies as a result and then it might make some of your prior data science project not workable so i highly recommend to use conda and specialized environment for managing your data science project okay so we have already activated dp here and we're going to go to the directory streamlit installability okay so we have three files here and we're going to run it streamlit run solubility app.py okay and now this is the web application okay so the left hand side here is the input parameters which is the smiles input will represent the chemical information of the input molecule so each line represents a different molecule so we have three lines here as an example and you could replace this with your own data and then we will be able to predict the value of the solubility as a function of the input smiles notation here so let me repeat this again this portion the smiles notation here each line represents a single molecule so you're seeing here three lines so it represents three molecules and each smile citation will tell us what is the atomic composition of the molecule and so here we see that the first line of code here cccc it has five carbon atoms and the second one has three carbon atoms and third one is a carbon atom and a nitrogen atom okay and we can even search for smiles of a molecule of our interest and let's search for a molecule of our interest let's search for aspirin and let's click on it and let's find smiles i'm going to search for it command f and then search for smiles and we have it here in 2.1.4 canonical smiles so we're going to copy that so this is the smiles notation and so i'm going to copy and then i'm going to paste it here and then after we paste it here we have to press command and enter in order to apply this and then note that the predicted value here will be updated so let's try it command enter because i'm on a mac so i have to do command enter and so on windows is probably ctrl enter and so here we have the input smiles notation which is right here the same thing on the left and then below here we have the computed molecular descriptors which are the four physical chemical properties here comprising of the moloch p molecular weight number of rotatable bonds aromatic proportion so these four variables are computed using the rd kids library let me go back here rd kit right here rd kit rd kit okay so using the chem and descriptors function and using the custom function that we have written to compute the aromatic proportion which is right here aromatic proportion and also the mole log p which is the first column here molecular weight which is the second column here number of rotatable bonds which is right here and the aromatic proportion is a custom function because rd kit does not compute that property and so we will have to create our own function in order to compute the aromatic proportion and so why did we use these four variables here how did we know that we have to compute these four descriptors it's because the original work by john dilani he used these four variables for building his prediction model so if you would like for more information please have a look at this original research paper and so here finally we have the predicted log s value and we have it predicted to be minus 2.0931 so this is the relative solubility value okay so there we have it a simple web application for predicting the molecular solubility values so let's have a look further into the code so image this line of code here 63 line 63 is this image it is for displaying the image by essentially assigning it opening it and then assigning it to the image variable and then using the st.image in order to actually display the image and then use column width to be true in order to expand image to fit the column width and then here we're going to write out molecular solubility prediction app right here we can even modify this to be like web app save it and then rerun it and then you're going to see that the name is updated to be web app and then these are the descriptors so this is in markdown format so markdown is allowing us to format text to have links italic or bold face as well for example if we want to make the text bold we're going to be using double asterisk before and after okay for highlighting solubility lock s we're going to add the double asterisk before and after if we want this to be bold and also in italic we need three of them click on always rerun so you see that it is in italic okay and also bold and what if we have only one italic what happens here so a single asterisk will make it in a italic form and if we have two it becomes bold okay and if you want to add links to it then we have the bracket here to define the boundaries of the text that we want to be linked and then in parenthesis we're going to add the link the url here and so you can click on this and it will take us to the original paper okay and this is the original paper it was published in 2004. okay so this block of code here will be reading in the input features which is the smiles notation so sc.sidebar header will be the header here the heading user input features and then read smiles input so this is the example input that we will be using and notice here that we use 5c let's rerun this so five c is right here and then notice that when we have a backslash n it becomes a new line and then we have three c which is right here and then we have a backslash n which is a new line and then it becomes cn okay which is the third line here okay ccc here ctc here and then cn and cn okay and this is the input text of the smiles notation okay so the text box that we see here comes from this text area function and the input argument include the smiles input which is the name of this right here and then the smiles underscore input will be the example smiles notation that we have used here so if we change this we change the first one from c to n and then this will be changed right it becomes n now okay and then we're going to add a dummy first item here in order to allow us to simply read in multiple lines of smiles notation and then later we're going to be skipping that dummy first item and so we're going to do the same for multiple times on line number 91 96 and also when we make the prediction so this will allow us to simply create a data frame of the smiles notation and also for generating the data frame of the computed molecular descriptors for a single input parameter so imagine if we have only a single molecule okay this works okay but if we don't have the dummy items let me show you which is at three places lines number 87 96 110 so let's save it so without the dummy let's see what happens here okay and when we try to make a prediction we get an error right here unless you okay so this is having okay it's skipping here okay i have to hide this one too 91 as well so actually it is 87 91 96 and also 110. okay let's rerun it so this will work right oh okay so i think we're skipping oh yeah we cannot okay we have to display x here otherwise nothing will be shown and then we have to display smiles here otherwise nothing will be shown see nothing is shown here and then we need to show the prediction so we need to type in prediction okay so instead of slicing from the second value onwards we're going to have it printing all of it okay now it should work all right it works now so notice that it works if it has multiple lines for the smiles notation here but what if it has only one okay only one command enter and now we have an error here okay so for ease of handling a single input parameters we're going to use the dummy item as i have mentioned right here we're going to be reading the second value onwards okay now it works right we have a single value here and we make a prediction it works okay all right let's continue so i have already mentioned that we have the input smiles notation here and the computed molecular descriptors is provided here in this block of code so very simple here so all of the hard work is done here in the custom function and so we're going to be using only the generate because generate will be using the aromatic proportion right here aromatic proportion function and then we're going to be using the generate function right here line number 95 and then the input argument will be the smiles and the smiles here will be what it will be the input smiles that we have okay so these will be the input right so it will be split according to the backslash n so each line of code here each line of the smiles notation will represent a unique molecule okay and then each molecule will be computed for its molecular descriptors and then it will be shown here in the x variable so if we recall y equal to f of x therefore x will be the input parameters for predicting the y value which is the log s okay and so the model has already been pre-built in the google colab just a moment ago and we are loading in the model here using the pickle.load function and the solubilitymodel.pkl and then we're going to make use of this loaded model here loadmodel.predict and then using the input x which we have already computed here right here right so there's three major components so the input smiles here is here these are the example right but if we change this to something else like aspirin right then this whole thing theoretically will be here we'll be here okay we'll be like that okay and then it will enter here and then it will be subjected to descriptor calculation using the generate function and then finally we have the x variables and then the x variable here will be used as the input argument when we want to make a prediction okay and so finally here we're going to be using the computed molecular descriptors that are contained within the x variables and then we're going to use the loaded model that we have already predicted which we have already built on the google code lab and then we save it out using the pickle object and then we're loading it back in into the streamlit web application and then we're going to be applying this built model here load model dot predict and then the input argument will be x which are the computed molecular descriptors and finally here st dot header is corresponding to right here predicted lock as values and then the actual prediction will be shown right here prediction and it's going to be showing right here okay so we can have multiple lines let's say that we have multiple on input files so here let's say that we have multiple molecules let's change this to something else so we're going to get a different molecule and so we're going to see that the prediction value also changes right if we want to change a carbon to the nitrogen what will happen here right so the log s value changes here and so you're going to see that it influences the molecular weight and also the molar p but not the number of rotatable bonds and the aromatic proportion so changing a carbon to a nitrogen is essentially changing one atom and so it will be influencing two descriptors here all right so congratulations you have now built a very simple bioinformatics web application particularly for drug discovery have you been using streamlit library in creating some web applications perhaps you have already successfully deployed the web application locally but do you want to deploy it onto the internet so that other people could have access to your awesome web application if you answered yes then you want to watch this video to the end because i'm going to show you how you could deploy the streamlet web application onto the internet using hiroku so without further ado we're starting right now okay so the first thing that you want to do is head over to the github of the data professor and you want to go to the repository called penguins heroku so all of the files that we will be using to deploy onto hiroku will be provided in this repository so the web application that we are going to deploy is the penguins classification web app that we have previously created in the third part of this streamlit tutorial okay so let's have a look at this repository so we're gonna see that it comprises of this initiating readme so this will provide us with the details that we see on the github and other than that we have copied the following four files from the github of the penguins web application penguinsapp.py penguinsclean.csv penguinclf.pkl penguinsexample.csv so these four files were copied directly from the repository that i'm going to show you right now so we're going to code we're going to streamlit and it was from part 3. so we copied everything except for the modelbuilding.py because the modelbuilding.py will produce the pkl and that's what we need which is the saved model that we have created let's head back all right and aside from the four files which is directly related to the streamlade web application we're also going to create three additional files called the proc file requirements.txt setup.sh so let's have a look at the proc file so the proc file will essentially run the setup.sh file and run the streamlet web application the requirements.txt will tell heroku to install the following python library and the corresponding version and so i'm going to show you in just a moment how i came up with this precise version number and we also created this setup.sh so this will handle issues regarding the server side in terms of the port number and it will add it to the configuration okay so let's have a look at the requirements.txt so what did i do let us open up the command prompt so i typed in cmd in the search and i'm going to activate my conda environment so if you have a contact environment you want to activate that using the name of your conda environment and so the conduct environment on my computer is called dp so i'm going to activate that so let me type conda list and we're going to see the exact version number that we have installed on our computer so let's see what do we have we have streamlit so we have right here streamlit version okay so i've been using different computer so actually on this computer that i'm using to record this youtube video i'm using a lenovo desktop which is running on a windows and the number that you see here is from my macbook pro so they're using slightly different number here and so pandas on this windows notebook is 1.0.1 whereas i have the older version on my macbook pro and numpy 1.18.1 numpy 1.18.1 scikit-learn 0.22.1 let's have a look scikit-learn 0.2.1 okay so the important thing is if you have already tested that your web application works on your current computer and you want to copy the exact version number into your requirements.txt so at the time of testing i used my macbook pro and so these were the version numbers from the macbook pro and so this worked perfectly so nothing wrong with this so what you could do in this requirements.txt file is to include all of the library that you are using in your code so why did i include these for library let's have a look at the code it's because in the penguinsapp.py we've been using the streamlit library the pandas library numpy and because pickle is built in we don't need to include that and also the scikit-learn so there's a total of four streamlid one pandas numpy and scikit-learn okay and then i included the exact version number and make a note that you need to have two equal sign here okay and so we have a total of seven files and including the readme that we have eight files and as a recall the four files here beginning with the penguins are related to the web application that we have created in srimlet and the three files comprising of proc file requirements.txt setup.sh will tell hiroku what to do what library to install which version to use and what precise command to run in order to run the streamlet web application okay so let's head over to heroku now so if you haven't yet signed up for hiroku you want to sign up and if you have already signed up you want to log in okay so after logging in we will see this dashboard and so these are the web application that i have already created in hiroku and so if we want to create a new app what we need to do is click on new and then click on create new app and then you want to give the name of your app here so let me try penguins okay that doesn't work app how about penguins st st for streamlit or even just penguin streamlit okay so that works and then once you're satisfied with the name of your app and make sure that it is available otherwise it will give you a error like this in red it's not available and then you just find a name that's available okay penguin streamlit and you could also select the region of the server that is going to host your web application so the options here are united states and europe so we're going to select the united states and then you want to hit on the create app wait a few moment all right so in the deployment method you want to click on github okay so i have already connected my github account with hiroku so if you haven't yet done that then you want to also do that as well and after you have already connected your github account with heroku then you could type in the name of your repository so what was the name of the repository it is called penguins heroku so let's copy that penguins heroku and then click on search click on connect and here you could also enable the option for automatic deploy meaning that whenever there is a change in your web application files on the github it will automatically deploy your web application onto hiroku so i'm just gonna leave that out and we're just gonna proceed with the manual deploy so don't do anything here so the default is master and then you want to click on the deploy branch okay scroll down and then you want to have a look at this feed here so it's going to provide you with the output in real time so what it is currently doing is installing the necessary python library so the good thing about hiroku is that you don't have to worry about the server you don't have to maintain the server so the only thing that you need to care about is the application itself so the thing is you just upload your application onto github and then you connect that with hiroku and then you could just simply deploy that web application so here it is also installing dependencies as well so this might take a couple of minutes so in the meantime you might want to grab a cup of coffee and sit back relax and enjoy okay so it is finished so it's going to be deployed at penguins dash streamlit.heroku app.com wait a couple of seconds and it's going to create the link at the bottom yeah right here so whenever it is finished you will see the link at the bottom it will say your app was successfully deployed and then you could simply click on the view link here and this will bring you the web all right so it's apparently loading and when that web application is loading for the first time it might take you some time okay so now the web application is loaded and so let's play around with the input parameters and as we can see the prediction label changes along with the prediction probability okay so congratulations you have now successfully deployed your streamlit web application onto heroku do you want to deploy your web application that you have just created in python using the streamlet library if you answered yes then this video is for you because today i'm going to show you how you could easily deploy your web application onto streamlit and without further ado we're starting right now okay so the first thing that you want to do is head over to the github of the data professor and then you want to click on the repositories and then click on streamlit10 and so a streamline10 will essentially be the 10th episode of the streamlet series on this youtube channel and so here we're using the sp 500-app.py so i'll provide you the links of the 10th episode in the description of this video and so we're gonna use this to deploy the app in this tutorial so let's have a look at that okay so this is the app contents so in a nutshell it will essentially web scrape the data directly from wikipedia of the s p 500 and then it's going to download the y finance data set directly from the yahoo finance via the y finance library and then it will create a data frame of the data and then it will also allow us to download the data as a csv file and finally we're also able to make some beautiful plots here all right and so we're gonna proceed with deploying the app and so before continuing further it should be noted that we need also the requirements.tx file which will list all of the dependency meaning the libraries that we're using in our web app and also the corresponding version number so how do i get this let me show you so you can get this by going to your terminal and i have already logged into my conda environment so i will directly type pip freeze and then requirements dot txt and then let's have a look at the contents of the file and notice that i have all of the libraries that are installed in my conda environment along with the corresponding version number so i will selectively choose the libraries that are used for the web app and then i'm going to copy the corresponding lines like for example we're making use of the y finance library so we're gonna copy this line we also made use of the matplotlib library and so we're gonna find matplotlib right here and then we're gonna copy that and then do the same thing for all of the other libraries that you're using okay so essentially you have the app file itself you have the requirements.txt and then you also have the readme.md file so this is normally created automatically if you ticked on it and it will allow you to show this readme here so i'm going to show you how you could include a button that will allow you to click on it and then it will launch the web application okay so let me open up the share streamlet website share streamlet and so it should be noted that for this one i've signed in using my github so let me try signing out and so for the first time when you sign in it will ask for your authorization that you're allowing streamlet to have access to your github account and so i have already done that so it's going to log in seamlessly okay and so this is the dashboard part of the share streamlet website upon logging in and so i'm going to deploy the first web application here so i'm going to click on the new app button all right and then i'm going to click on the repository and then i'm going to click on the data professor streamlet10 and then the branch will automatically detect to be main because if i go back to the github here notice that this says mean okay or it could be master depending on your own github profile all right and then let's have a look at what is the name of the app and it's called sp500 dash app dot py let's copy that paste it here and then that's all we need to do in order to deploy the app and then you click on the deploy button and then wait for a while and so if you see this icon coming up it means that your app is probably going to be successfully deployed given that you also have the requirements.text file that i have mentioned previously so in just a moment while it is installing all of the necessary libraries for you all right so you see all of the log here so most of it has met the requirement here and it has already installed the y finance library and it says here that it has processed the dependency and then you saw the balloons it means that it had already successfully deployed so let me click on this bottom right button in order to minimize it or i could bring it up again by clicking the manage app and then hide it again alright and so congratulations now you have successfully deployed the web application onto the streamlit website in the share.streamlit.io so let's have a look at the plots here we see one plot here if i move to two then i'll be seeing two plots here all right so there you have it this is the deployed web application which is quite easy and it should be noted that currently right now streamlit is hosting a beta trial so it means that you need to be in the 1000 selected users in order to try out this new feature so head over to the streamlet share website and click on request for invitation so that the streamlet team will invite you to try out this awesome feature and so i'll provide you the links to that as well
Info
Channel: freeCodeCamp.org
Views: 347,569
Rating: 4.9697242 out of 5
Keywords:
Id: JwSS70SZdyM
Channel Id: undefined
Length: 191min 52sec (11512 seconds)
Published: Thu Jan 07 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.