Analyzing Recent COVID-19 Trends Using Python | COVID-19 Data Analysis | Python Training | Edureka

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] hello everyone this is junit here from edureka and i welcome you all to this session where i'll be demonstrating to you how to perform analysis on the live covet data so without any further ado let me take you to today's agenda we'll be starting this session by discussing what is meant by data analysis and why do we need it following that we shall see the steps that are involved in analysis of a particular data set moving ahead i'll be talking about the benefits and tools that go in for data analysis and finally we'll take a look at how to scrape latest covert updates and perform analysis on it before we begin do consider subscribing to our channel and hit the bell icon to stay updated on training technologies and also if you're looking for online training certification in data analytics please do check out the link given in the description box below all right now so why do we need data analysis right with so much of information being collected through data analysis in the business world today we must have a way to print a picture of that data so we can interpret it data analysis gives us a clear idea of what information means by giving us a visual context through maps or by graphs this makes data more natural for human mind to comprehend and therefore makes it easier to identify trends patterns outliers with large data sets so now coming down to what is data analysis well data analytics is a science of analyzing raw data in order to make conclusions about the information the techniques and process of data analytics have been automated into mechanical processes and algorithms that work over raw data for human consumption data analytics helps businesses to optimize its performance now let's take a look at steps that are involved in data analytics there are five basic steps that go in for analyzing of a data all right so let's see each of these steps in detail starting off with a step one that is asking the right questions the first step towards any sort of data analysis is asking right questions from a given data identifying objective of the analysis it becomes easier to decide on the type of data we'll be needing for drawing the conclusions then we have data wrangling data wrangling sometimes referred to as data mugging or data preprocessing is a process of gathering assessing and cleaning of raw data into the form suitable for analysis well data wrangling has three sub steps starting off with gathering of a data after identifying the objective behind the analysis the next step is to collect the necessary data required by us to draw appropriate conclusion there are various methods by which we can collect the data then we have assessing of that particular data after the data has been gathered sorted in the support form and assigned to a variable in python it's time to do some high level overview of the type of data we are dealing with then moving on with the same thing we have data cleaning data cleaning is a process of detecting and correcting missing or irregular records from a data set in this process data is present in the raw form and cleaned appropriately so that an output data is void of missing and inaccurate values then moving on to the step three we have exploratory data analysis or in short we can say it as eda once the data is collected cleaned and pre-processed it is ready for analysis as you manipulate data you may find you have an exact information you need or you may need to collect more data during this phase you can use data analytics tools and software which will help you understand interpret and derive conclusions based on the requirements moving ahead to the step 4 that is drawing out the conclusions after the analysis phase is completed the next step is to interpret our analysis and draw conclusions from it as we interpret the data there are three key questions which one must ask themselves did i analyze the answer of my original question was there a limitation in my analysis which could have affected my conclusion and the third question that you can ask yourself is was the analysis sufficient enough to help making decision making and finally coming down to the last step that is communicating the particular results now that the data has been explored and conclusions have been drawn it's time to communicate your finding to the concerned person or communicating to the mass employing data storyline now that the data has been explored and the conclusions have been drawn out it's time to communicate your finding to the consent party this can be done by writing it in a story line or writing blogs or making predictions of finding the reports great communication skills are the plus in this stage since your finding needs to be communicated in a proper way to the other people so now what are the benefits of data analysis well data analysis positively affects an organization's decision-making process with interactive visual representation of a particular data businesses can now recognize more pattern quickly because they can interpret data in a graphical or a pictorial form here are some of the more specific ways the data visualization can benefit an organization starting off with correlations in the relationship without data analytics it is challenging to identify correlation between the relationship of an independent variable by making sense of those independent variables we can make better business decisions then we have trends over time while this seems like an obvious use of data visualization while this seems like an obvious use of data analytics it is also one of the most valuable application it is impossible to make predictions without having necessary information from the past and the present trends over time tell us where we were and where we can potentially go with our particular business model then moving ahead to frequency closely related to trends over time is a frequency by examining the rate or how often customers purchase and when they buy gives us a better feeling of how a potential new customer might act and react with different marketing and consumer acquisitions acquisition strategies going ahead to the next one that is examination of the market you see data analytics takes the information from different markets to give out a insight of an audience to focus your attention on and which ones to stay away from we get a clear picture of opportunities within those markets by displaying the data on a various charts and graph finally coming down to risk and reward you see looking at the value and risk metrics required experience because without data visualization we must interpret complete spreadsheet or numerous amount of numbers once the information is visualized we can pinpoint the area that we might need to give some attention towards now that we know the importance of data analytics and how it is being used in a real world today let's have a look at various tools that go in for analysis of our data so if you're someone who thinks that we have few tools for data analytics well this might take you by surprise well we have n number of tools that go in for performing data analysis you see one of the popular way to acquire data or you can say scraping of a web we can use something called as beautiful soup and apart from that in order to visualize our data we have n number of tools such as power bi then we have cbon tensorboard and many more all right so now that you have a brief intuition about what does it mean to perform analysis on a particular data set let me now take you to my jupyter notebook and show you how we can scrape real time covet data and visualize and perform analysis on that particular data set all right so for this let me move to my code editor the code editor that i'm going to use over here today is going to be a jupiter notebook all right so the first thing or the first step that we discussed earlier is to acquire the data right so let me just quickly write it here data collection okay data collection is all about acquiring right data at a right time and then we have data pre-processing and then we have to perform exploratory data analysis so i would write the short form of it eda and finally we are supposed to draw the conclusions so this is what we are going to do here today in order to collect our data i can either use like csv file and with this current kovi trends i can obviously download the latest version of it from the kaggle what i would do is we'll let's scrape the new data from the web okay so let's see i would be using this website called as weldometer okay so if you don't know about this voldometer it's a website that is available and with this we can get the live updates of the covet cases okay so let's go here by view by country right and here we are presented with a table okay so now what i'm going to do is i'm going to scrape this particular website and make use of this particular table that we have over here and we're going to plot all of this data into a different forms and we'll also use the different features right so we have these columns like total cases new cases total depths and all of these and we're gonna perform analysis with respect to each of these so in order to scrape the data what i'm gonna use is i'm gonna use something called as beautiful soup so first off if you don't have beautiful soup you can just install it by pip install beautiful soup for i've already installed this so i won't be installing it once again so but yeah if you want to install you just have to do that so let's now import this from bs4 import beautiful soup as so basically i'm trying you know i'm just trying to annotate the entire huge thing so it would save me a lot of time like i don't have to write the entire beautiful soup so let me just write it as soup and let me also import date and time so from date time import date and date time so now what i'm gonna do is uh we'll straightly get down to scraping our website so first off let me give a comment here web scraping great so now uh what i'm gonna do is let's write this variable like say url and let me take this and copy this particular url great and now what i'm gonna do is i'm gonna send a request so request is basically like let me show you what i'm doing here so every time you deal with websites you we have two parts here we have the client and the server okay this is gonna be client and this is gonna be server so we are the clients over here okay we send a request to the server and return server gives us a response so in order to give this send request right we have something called as the request like we have a module called as request and we are going to make use of that so let me quickly import that so from url lib dot request import request and then we also need url open right so url open is basically used to open our website okay so let me quickly get down here let me execute this cell or let me quickly open our url so i'm gonna give a variable here request and then call the request class and over here we are going to pass the url apart from this let's also give headers and here it's going to be user agent and as the dictionary we are going to give mozilla 5.0 all right so now we want to open our webpage so webpage this would be url open and we are going to pass the request okay so if i just print out what this this web page gives they'll just give us an object right okay so as you can see here we are getting this http response so basically this thing here we are sending a request and this is the this is the response that we are getting from the server and now we are going to parse this so this is where we are going to use beautiful soup so i'm going to give the name here as paid soup and i'm going to call our beautiful super over here and then pass our web page and i can also give here as html parsing right so html dot parse okay and now let me just quickly execute this this should take some time depending on your internet speed and apart from that if you want to see what this page would look like it's basically an entire html of this particular webpage okay so now what i'm going to do is apart from this let's also import date and time because i want to show it to my user that we are updating this so and so the data on that particular time okay for this i'm gonna use my system data and if you see over here the data that goes in right so the data that we have over here is that like if i go for now it just gives me you know still most of the cases are not updated so what i'm going to do is i'm going to take the previous day's data and the reason why i'm going to do that is because you know if i take up the previous data data then it would be the complete data set right so that's what i'm going to do here so in order to get previous days data so let me first make today so this would give me the object of like current day right so datetime.now and as i mentioned earlier we need the previous days data so in order to get previous days data i'm gonna use something like yesterday's string you can just give any name that you feel like i'm just giving here like a random name and what i'm gonna do here is i'm gonna create a string format and now in order to get previous date so it will be date dot today and then we also have something called as string format comma today dot day minus one so minus one would written as the previous day's time and then today right here all right and let me see what this would look like okay it's gonna be day right so okay so let us now execute this so this is giving us the previous days time all right so now let's quickly get down to scraping the website okay we anyways have this what i'm gonna do is let me just rearrange this particular cell and yeah so now what i'm going to do now we have the complete html page that is in the form of page soup now i don't want the entire web page data i just want this particular table over here so let's kind of parse our table so let's say table so this would be page soup dot find all okay and now we're going to pass the table the table is nothing but the table tag here and we'll also pass an id if you know the basics of html then you might know that you know there can be one id something which is pretty unique and no two data can have a same id so the name of the table over here is going to be main table countries yesterday okay and let's kind of execute this and see how this table would look like it's going to be underscore right so it's paid soup dot yeah okay so as you can see here like if i have to compare this we are getting this only this table over here okay and now based and if you want to see the id for this you can see it's pretty much evident here we have main table countries yesterday and now we are still going to parse this table in order to get the all the column data and the row data so let me quickly show you how that can be done okay so let's give some variable name let's say containers and this would be equal to table dot zero the reason i'm putting zero is because i'm going to get list over here dot find all and tr which represents tag and then you can say style and as the dictionary will give a colon here and let's see what does this container have and let's execute this so as you can see here this is basically giving me the list so if i put list of zero this will just give me the country name then total new and all this so this is basically this value over here and apart from that if i need rest of the data so let me say one so you can see here the we are just having the data over here so this is the world right so world and all of these data and this is present over here it's this part all right so now what i'm gonna do is i'll create an empty list which will hold all of these data so all underscore data and this would be an empty list okay and then we'll have a for loop to iterate over the countries so we are going to iterate over this container so what i'm going to do here is for container or for country because this is going to return as a country in containers so country data right okay and as this does the data what i'm going to do is i'm going to like first off save the first one like let's say title and container zero and i'll delete container zero okay and now whatever data is there we are going to add it to this and temporarily we are going to add the country data to this particular list okay so now i have country container this will be equal to country dot find all just we are parsing the data here and then we'll do td okay and now what we will do is if country container of one dot text is equal equal to china all right then continue okay and i'll also have a for loop here for i in range one comma length of country container right so length of country container because country container gives me the list of components that are present in that country so what this means is now for example if i take country container as india then we'll have all the values with respect to india and all of this will be present in the form of a list okay okay so let me put back the column here so let's say final feature this will be country container of i that is the index dot text okay and now if clean and clean is basically value that we are going to define so this is we haven't defined clean yet so if that is true then i not equal to one and i not equal to length of country container minus one because we are going to go like it's gonna be n minus one right so yeah then in this case what's gonna happen is okay let me give some space here yeah so if you want me to zoom let me zoom in a bit great okay so now what what's gonna happen is we're gonna try cleaning out our data a bit okay so now it's going to be final feature is equal to funnel feature dot replace we will be replacing comma with basically an empty space okay and also if you have sometimes we'll have plus and minus right so we'll also fix that up so that is nothing but if funnel feature that is this is going to written as a string dot find and basically this will be plus here okay then we have final feature we're just trying to replace the value here so wherever we have the plus we'll replace it by empty string and similarly we'll do this for the negative one okay before that one last thing we'll just change it to the float value okay and now as we can as you have seen we just done it for positive we'll also do it for the negative ones so that is it's going to be alif so find feature so final feature dot find and this will be minus that is if we have this over here so we'll replace the minus by empty space so let me quickly fix this up okay so now we are done with this if clean and now apart from that sometimes we'll obviously have a missing data right so if final feature is equal equal to any na means not available wherever we have that over there will have funnel feature will be zero and if the final feature is equal equal to an empty space right so if you have this or final feature is equal equal to some space like this then we'll just replace it by -1 great fine now that we have cleaned this now what we'll do is we'll append this data to country data and once we come out of the for loop we'll append the country data to all data so we'll have country data dot append so here we're gonna append final feature and finally we'll also append all data that is the country data to all data right so all data is a list which we did up in country data and execute this so let me zoom out so as you can see here it's a pretty lengthy code here and finally we will we want to see what does all data would look like so we'll so first off let me execute this okay so we have made a small typo it's gonna be or okay and let's execute this as i mentioned earlier we are supposed to define clean so we'll define clean somewhere at the top let's say over here and let me execute this i think it should be done and let's see how our data would look like so as you can see here with respect to the data that we have now we have for us we have couple of data then when we go down to india we have couple more data with respect to india and whatever data you see over here is this one it's a row okay so now what we are going to do is we are going to put this in the form of a data frame so to do that it's pretty simple to do we'll just create a variable here df which usually represents data frame and then we need to have pandas and so for that let me input pandas so import pandas as pd let's also import numpy import numpy as np looks great and let's me quickly create a pandas data frame here so we have pd dot data frame and then we have all data okay and let's see how this would look like so df dot head okay so as you can see here we have all the data present here and wherever we had this missing data right so we have filled it with -1 so if you see over here if you see the world world data here we have kind of like missing data present here and similarly it goes for every table okay and now only thing that is left to do is giving this name for this column okay either you can scrape this entire data you can just take out that title over here and then you know create a for loop and then you know we can get this data or else you can just write it hard coded so as we have done this one way around as i have done using a for loop just to get the titles or the headers for this particular one i'll just hard code it over here okay so it'll be column labels so this will be a list okay and let's see if we can just copy it from here let's see if it's possible to copy this no i think we are supposed to have type it so yeah let's get that done so first off we have country and then we have total cases then we have new cases we have total debts okay and then we have new debts going out to total recovered all right then we have newly recovered people active cases then we have serious or critical then we have total cases per million and then total debts then we have total test and tests per million in the app population and finally coming down to which continent that particular country belongs to and apart from that if you see we have a lot of missing data in 15 16 and 17 so i'm gonna drop that so let me just df dot drop and here i'm going to pass 15 16 and 17. this is going to be the column labels right so either you can give in place is equal to true or you can just give it as a reference variable so here i'm giving in place is equal to true or you know just simply you can just do df is equal to oh yeah the reason why we are getting this is because we haven't defined an axis so in place is true and axis has to be one because it has to remove all these columns right so it should work now so as you can see we have all the data and now all we need to do is give this column names to our data frame so df dot columns and this should be column labels let's execute this and let's see how our data frame would look like df okay so as you can see here we have the country total cases new cases and population and the continent that particular country belongs to so if you want to see just the few so df.head okay so apart from this there are three columns that i would like to add that is nothing but percentage increase in cases percentage increase in debts and increase in recoveries so for this what i'm going to do is basically percentage increase in cases or debts or recoveries is nothing but you know total number of cases divided by total number of new cases times 100 so let me quickly create a data frame for this so df and here it's going to be percentage increase cases okay and here what we'll give is df new cases this new cases is from here so new cases divide this by df total cases times 100 and let me copy this and we'll do the same for the deaths and the recoveries so let me give here and recovered and let me change the formula for this new cases new debts by total debts and then we have new recovered by total number of people that have recovered and let's execute this okay maybe we might have done a small typo okay so let's see where we are going wrong okay so we have total number of cases over here and the new cases over here okay it looks like we have not typecasted this so let me quickly get that thing done so basically what has happened is we have all this data it's present in the string format so before we do this we have to convert everything into a numeric form so it's pretty easy we'll have to iterate it over the for loop so it'll be like for label so this is going to be after this so oh yeah so for label in df dot columns so what's going to happen is if label because apart from the country name and the continent name rest everything has to be a numeric data right so we'll fix that because see if you can see here the country name is obviously a string and the continent is also string so apart from that everything is a numeric data so now what i'm going to do is df label this would be equal to pandas dataframe dot to numeric and we'll give that particular data frame and let me execute this okay so this has executed it's basically doing the same thing it's just converting whatever data we have to numeric form and now once we execute this it should run quite well so new deaths so maybe i might have done a small typo here so what i'm gonna do is i'm just gonna copy this and paste it over here so let's execute this okay so the place where we are having the error is over here so it's going to be deaths right and let me execute this yep so now if i just print out our data frame here so df dot head and if i execute this you will see we have an extra table added which says percentage increase in deaths recoveries and all of this so this is all there is about data collection and data preprocessing so we have completed the two parts over here so now moving ahead to the stage three that is eda and coming down to the conclusions so let me just give a small title here so eda which stands for exploratory data analysis okay so now let me give some cells so starting off with eda what i'm going to do over here is we'll give a generic you know very generic uh visualization what i'm gonna do for this is we'll see how is this plotting of this cases are like how is the number of cases increasing debts as well as the recoveries we'll have a look into that and then we'll also see how the cases are increasing with respect to different parameters and the parameters can be something like how how cases are serious number of active cases and total debts with respect to different countries and all that so let's see which okay so first off let's take something like total number of people that have recovered active cases okay total number of active cases and also the total number of deaths okay so we have something like total recovered total debts which is over here and we are going to plot this with respect to the country okay so whichever country has the highest would be first and so on and so forth so first off let's take the parameters that we want so let's say something like cases okay the cases here basically takes a list or i would say it takes a data frame and data frame is because it takes that particular value so we want total recovered and then we have total number of active cases okay and then we also have the total deaths and okay let's just put this in semicolon here or else it will start throwing us key error and then we'll pass your lock because we need the value for this okay and now what we're going to do is we'll just create a separate data frame for this so if i put this cases like this for show you so it's basically giving me this right so now what i'm going to do is i'll create a cases data frame and this would basically be pd.dataframe this will hold all the data here and we i want to reset the index so let's see cases data frame oh yeah we have to put this obviously in the form of uh okay i've done a small typo here so this usually happens when you press the tab right so yeah so as you can see we have put this in a form of a data frame and now let's plot this so before we go ahead let me just create a columns so df dot columns and basically we're trying to give the name for the column it's index and zero there so we'll say type and then we'll also pass total and now what we'll do is we'll create a percentage here trying to add this to this data frame so percentage and this will be np dot round okay and we have 100 times cases data frame and here we'll take total and then we'll divide this by the sum okay so np dot sum of cases data frame where we'll be taking total value and then we're gonna take it till the second decimal place okay and let's see how this would look like okay so basically i've tried to add percentage over here but we didn't want this okay so we don't want this we want to add this to the cases data frame so we'll do that cases data frame and let us do this cases data frame here as well so as you can see this is the percentage increase in the cases that is 85 percent of the people have been recovered of the total and then the total number of active cases is 12 percent and total debts is gonna be two percent okay so now that we have got this we'll also add one more data that is the type okay that is the virus type we all know that the virus type is gonna be covet so we'll just put here covet 19. and we'll add this the number of times we have this for i in range so the length of cases data frame all right and now we'll finally get down to plotting this up so figure if you want to see how this would look like so let me just show you how that is okay so now what we're going to do is we're going to plot with respect to percentage and virus okay so for this let's say figure so over here we're gonna we're gonna use ploty and ploty provides certain number of various versions okay so what i'm gonna do is i'm gonna add a new block and here we're gonna like import all the plotting related libraries okay so import matplotlib dot pipelot as plt then we have import plotty dot graph object as go finally we have ploty.express sorry it's going to import plot t the reason i'm using plotly instead of c1 or matplotlib is because totally gives me a better visualization okay express as px then we have import 40 dot offline as py and we'll also import c1 right import c1 and sns and then we also have couple of thing that is import this is basic dependencies gc and we also have something like we don't want the warning so import warnings and like as we want to disable this warning so it's going to be warnings dot filter warnings and then here we're going to pass ignore all right and finally one more library that i want that is import pandas profiling okay so from import pandas profiling from pandas profiling import profile reporter now let's execute this so if you don't have plotty right so all you can do is you can just do pip install plotty uh similarly for c1 so paper install c1 it's as simple as that so now coming down here so first off we'll create this figure so here we have px we obviously need a bar graph so bar and now what i'm going to do is i'm going to pass the data set that is cases that's this cases data frame okay so it is yeah it's fine and then what do we need on the x-axis so x this will be virus and on the y this will be percentage and the color this will be type and finally every time i hover over the data that is our data let me fix that up this will be basically total okay so every time i put my cursor on top i will be getting this particular value okay and let's now show the figure so fig dot show and let me execute this oh yeah it's going to be y right not you okay so i have put this in a lower case and let's execute this and we should get a graph now okay so as you can see here every time i hover over here if i just remove this howard data you will see that we won't get any data over here so we'll just get a general percentage and whatever we are trying to show here but now if i want something to be specific all i can do is our data and i want to see the total cases so you'll see here we have a total number of cases okay so what is being represented here so with this graph this is basically a bar plot which that's where we have virus on this x-axis and the percentage the virus won't change because we all know it's a covet virus cov19 and the percentage over here represents that you know around 85 percent of the people have recovered that is the total number of recoveries and the total number of active cases that are running now is somewhere equal to 12 percent and their total debts as of now is gonna be like two percent of total number of cases that have occurred okay so this is a good sign we have more number of recoveries which is pretty much a very good sign and less number of fatalities is something very it's an amazing thing because now we are not losing our lives what i would do next is we are going to plot this against new cases newly recovered people and new deaths so i won't go through entire thing all i'm going to do here is come down here put new debts new cases and then we'll go through new recoveries and then finally new debts and let me execute this so as you can see here this is the new debts that we have and this is something that we can say something happened in like past 24 hours so the total number of cases over here if you see is being the total number of new cases is somewhere around 52 percent okay and the number of people that have recovered is also 46 which is pretty good and we can see we have very few fatalities all over the globe this is not something related to a particular country is something with respect to uh the entire world and now finally coming down let's also take the column that we have added here that is the percentage increase in cases this one and even this would be with respect to the virus so here what i'm going to do is instead of you know going through this entire stuff like instead of plotting something like this what i'll do is i will just plot a bar graph like the one which you are kind of used to it so let's kind of fix that up a few things that would remain the same over here is going to be okay so what we'll do is although a few things would remain the same let's create a new data frame all at the from the start okay so we have percentage right so per so let's say percentage increase in cases so this will be np dot round all right and we want to round off the figures so we'll have a data frame and here we'll take increase in cases increase in recovery and increase in debts so i will just copy it from here so increase in cases recoveries and deaths right so let me copy this and paste it over here so let me enclose this within a semicolon and let me give a comma over here okay and now as i want the data over here so what i'll do is dot loc of zero and the decimal places that i want this thing to be is two fine and now we'll say something like data frame okay so we'll put this in a form of a data frame so pd or data frame and we'll just pass percentage right okay before that let's see how this would look like okay now what i want to do is i want to add the percentage of the cases so dot columns and the name of the column that i'm going to give here is percentage all right and now let's try to plot this up so we have the same drill we have figure and then we'll try using a different graph so here we go dot figure we've created an object here and now we'll add the value so fake dot add so we need a bar plot so go dot bar okay and here we are going to pass the values so x over here will be the index value so this is p e r underscore df dot index and then we have y is equal to p e r underscore df and percentage right and the color that we want over here because we have three values so let's say marker color so this will be like yellow then here we can just give some random value like blue and then red and let's kind of show how this figure would look like figure dot show and let's execute this okay so we might have done a small typo here so let me just copy this and paste it over here and let's execute this now oh yeah the reason why we are getting this error is because we have done the same mistake so columns and it should work now okay so as you can see here we are getting the percentage increase in data over here so the number of cases that have increased is by 0.57 that is 57 percent and then we have 45 percent increase in the debts that's pretty bad and the number of recoveries is around 60 okay so this is good like although number of increase in cases is kind of high but as long as number of people are recovering is greater than increasing in the cases i would say it's a pretty good figure to work with so as of now we were working on general visualization of our data so now what we'll do is let's go a bit specific let's go have a visualization with respect to continent right so let me give here as with respect to content so see you in okay and let me execute this let me add few more cells and yeah so now what i'm going to do is i'm going to create a data frame let's say continent and this will be our data frame so if i have to get the continent name we obviously don't have any number of continents okay so what i can do here is we can just use group by clause and with that i can just get the continents that are similar in name so what i'm going to do here is df which is our data frame dot group by continent and then we'll do some and then we'll just do a drop on so if you remember somewhere over here i just added an extra data to our data frame so what i'll do is i will just restart our kernel and run all the cells okay so what this will do is this will just reset the previous values that we might have done and there's something which is expected when you are trying to work with the jupyter notebook so now that we have done with that so let me quickly fix this up okay and let's see how does this continent df looks like so basically what we have done here is based on the continent we have just got six rows now right because everything all the data over here has been combined as one so now what i'm going to do is let's kind of plot this with respect to the continent and apart from this i would also like to have index that would really help me out when i'm trying to plot this okay so continent vf dot reset index and let's execute this so now we have this index over here and now what we'll do is we'll just try to create a function which would just plot out the given data okay so here if you remember we hard coded each and every values same over here we hard coded each and every lines of code and pass the data explicitly right so now what we'll do is moving ahead we'll just create one function and based on this function all we need to do is pass out the list of data and with respect to that particular list of data we will get the output okay so for this we'll say def continent visualization and here we'll just take the list so we'll say vlist so list here over here vlist over here will take list of data that you would like to visualize so if i say total number of cases so it would visualize the total number of cases with respect to that continent right and now what i'm going to do here is for label in vlist okay so now once i have this so let's say something like c underscore df so df so we'll say continent data frame and here inside this will take the continent column and then we'll also take the label and now we also need the percentage and we'll round the figure up so np dot round and we'll just say 100 times cdef wherein we will take the label and then we will divide this by np dot sum c underscore label okay this label over here represents whatever the data we want okay so it can be like total deaths total cases or whatever it is and we want it by up to two decimal places and similar to what we did earlier we'll also add virus column so will be virus and this would basically be nothing but a list wherein it says covered iphone 19 k4i in range this would be equal to the length of the cdf and similarly what i can do is we just have to plot the graph so let me just copy the code from here so that we don't waste our time on typing the code so let me just copy this from here and paste it over here okay and we all we need to do is we'll just change this to c underscore def and here we're gonna pass with respect to continent because we have like six continents and with respect to that we will get the different colors so now what's going to happen is this would basically take this function will take up list of values that you would like to take so the list of values can be like total cases new cases total debts or the new deaths it totally depends upon that to give you a better understanding what i'll do is let me also add up a label here so we'll know what we are you know trying to display so fake dot update underscore layout and over here we'll give something like title so this will take up a text and within this text i'll just pass out the label values and let's kind of execute this now so now what we'll do is we just have to pass the list of values right so let's create some list of values so let's say something like we wanted with respect to the cases list okay so over here it will be like total number of cases total number of active cases new cases what is the severity of that particular cases and then finally we will also look at the total number of cases per million so let me just do that so cases underscore list and this would basically be a list and apart from that we'll also see with respect to deaths okay and then we have number of recoveries or recovered tests so this will be like this so let's see as i mentioned earlier over here we'll be passing total number of cases and then we'll have active cases and then we have new cases then number of severe or critical cases so i can just copy this from here fine and then we'll also need total number of cases per million right so let me copy this and put it down over here and similarly what i'm going to do for deaths so let's also take up the values for deaths that is the total deaths so let me copy this and put it down here okay after deaths then we'll take a new list of new deaths so over here all of these is with respect to continent so let me paste it down and then let's see which one we can take we can take debts per million right so test per million where is that yeah we have debts per million over here and let's also take up the recovered list so if i see here we can say total number of people recovered and newly recovered so you can say you can just copy this two and copy this if you want we can add percentage increase in recoveries right so it's totally up to you let me just add this here let's also plot percentage increase in record people and now what we are going to do is let me execute this we will just call this function okay continent visualization and now only thing that we have left to do is pass this list value for this what i'm going to do is let me rerun the cells from the start and let's see what happens only reason why we might be getting this is because we might be adding some new data here so let me kind of fix that up so the reason why we're getting this error is over here percentage right so let me execute this once again so as you can see here we will get a list and as a matter of fact we will not just get one graph we will get the graph over here with respect to total cases total number of active cases new cases serious or critical and total number of cases per million so as you can see this is for the total cases so it's here with respect to the different continents so let me kind of zoom out so if you can see here we have africa asia then we have australia europe north america and south america and by this we can figure out the highest number of cases over here are in europe like if you take up the total then it's 30 and then next comes asia which is 24 percent and the percentage in africa is around three percent total cases and in north america it's it takes up 26 percent and south america out to 16 percent so there's a total number of cases and let's see what happens when you take active cases right so if you say active cases the active cases is highest when it comes to north america that's specifically us right and then the second highest can be somewhere like you know second highest is there's a competition between asia and europe but as of now europe is uh taking the second place that is 25 percent and then we have asia with 22 percent and then if you want to see the number of new cases that is happening you can see asia is being reading over here what we can figure out with this graph is in the early days you know the number of cases were pretty high in other places or other continents but now with the second wave out there there is a rapid increase in cases especially when it comes to asia so you can see it's 54 percent and next we have south america then we have europe and as you can see the new cases in u.s are the north america has dramatically decreased and when it comes to australia you can say or when it comes to africa you can say it's just 1.3 percent so we can say safely that there are no new cases not many and if you see here we cannot see this australia right so with this we can say that there are negligible amount of new cases in australia and when we see here with respect to serious or critical more serious conditions are in europe and then comes asia and then we have south america and total number of cases when we say per million is highest in europe okay so 50 of it that is 50 of the people have been affected and then we have 16 percent in asia and then north america taking 16.62 percent so asia and north us or north america is almost you know two to two okay so now we saw how we can plot it with respect to continent right let's narrow down our vision so we'll now try plotting the content or plotting the data with respect to the country okay so let me quickly give a comment here let me give some space and with respect to countries okay let me give this as a markdown and countries so first off we'll create data frame here so df is equal to df.drop so we don't want the first one because the first one is the world data so if you see out this here the zeroth index over here refers to the world data so world is obviously not our name of a country so we'll try to drop this and we'll take rest of the other countries right so let us quickly get that done -1 and then we'll say country df and this would be df dot drop zero okay so this is supposed to be in this braces and yeah so now if we see our country df so you'll see now we have the world data is missing and yeah so over here if you can see we have two one zero rows that is 210 number of countries and the columns over here is going to be 18 that is number of features right and now what i'm going to do is i'm just going to create a data frame and straightly get down for looking into the top five countries that are affected with covet okay so we'll say country and we need columns and except the zeroth one because zeroth one over here represents the name of the country so we'll take except that so that is 1 to 14. and the reason i'm taking 14 is because i personally don't need this i don't need the continent or percentage increase in this i just need the population one and now we have to create a figure object so fig is equal to co dot figure and now we'll say c is equal to zero so now we have to put for i in country country df dot index and now what i'm doing is if c less than look at so look at is a variable that we will define here so this variable over here basically represents how many countries you want to select top five countries top three countries so it goes through each and every country so fig dot add trace and now here we'll go for like co dot bar graph and then the name that is the name over here is going to be country df and will pass country of which index of the ith index okay and now similarly what i'm going to do is we have to give the label for that x so this will be country and now y this will be country df dot loc of i and then we have 1 to 14. so if or else if the condition satisfies then break and every time this happens i want to increase the value of c kind of let me zoom in and now what we'll do is we'll update the figure that is just giving the title so fake dot update so we have to update the layout so update layout so here we'll give the title and this would be nothing but text and the data over here is gonna be low cut and now all we'll do is fix dot show i think this should be fine okay and now we have to just have this look at right so look at we'll give here top five so so okay and let's execute this now the reason why we are getting this is because few of the values here are dwarfing over the other values okay so as you can see the population value is pretty huge compared to other values so what i'll do is instead of just taking the numerical data i'll convert this into the form of a law okay so if i do that what's gonna happen is you know all the data would get normalized and then we can see something nice okay so let me first what i'm gonna do is top countries affected okay so this is one thing that i have changed and apart from that what i would like to do is i will also add couple of parameters here so here it's gonna be something like y axis underscore type and we'll give here as log and yeah i think this should do so let me execute this so now as you can see once i converted this in the form of a log the thing is the data has been normalized okay so if you know the log graph you know it doesn't increase like all of a sudden so now what can i do over here so what am i seeing here so this all this is saying is the top five countries that are being affected okay first off we have us so this is u.s and then we have india over here and then we have brazil france and russia and over here we have total number of cases uh the total number of cases won't be of that help the important thing over here to see is the new cases so you can see here the total number of new cases are increasing in india okay and the population of india is around 1.39 billion and if i have to see if he can see here almost 10 million of those people have been affected here and number of recoveries okay or the new recoveries you can see over here so if i have to compare the new cases with new recoveries i would say yes we are right on track i mean if we say you know the total number of new cases and total number of new recoveries are almost tip to toe or new requires is better that means there's been less stress on the medical agencies over here okay and yeah so the total number of tests that have been performed per million uh you can see u.s and brazil is ranking over here or the france is ranking over here and then the number of deaths in india is surprisingly very low okay so deaths per million is 131 so which is very low i would say and the total deaths although we are suffering are quite high but the new deaths you can see over here it's not that much and you can see here the highest number of deaths is there in brazil so this is how we can get it done and another thing that i would like to show here when we try to use plotty right what i can do is if i just click on usa you will see here it's just the data it shows with respect to usa so you can see here the total number of cases in u.s is this and the population of u.s is this and the total number of tests if you can see here like almost the entire population have been tested over here and similarly if i go here to india so this is the population and according to the data the total tests that have been performed is this you can say almost most of the people have been tested here the total number of cases are high but new cases are quite high okay so we have like around 294 000 cases but the debts are quite low okay and the recovery rate is kind of similar to the total number of new cases so which is a good sign and that's per million we have pretty less so we have 131 people dying for a million so i mean i do agree that even a single death is a calamity but yeah we are doing better with respect to other countries like if you look into the data for france so you can see here debts per a million is 1554 and similarly let's see for russia okay even russia is like 728. so like if i want to see top 10 countries you can even look at that so now you get the top 10 countries and if you just want to look at top two countries or top three countries so now you'll get just top three countries that is india us and brazil all right guys with this we come to the end of our session i hope you enjoyed and learned something new if you have any further queries please do mention them in a comment box below until next time goodbye and stay safe i hope you have enjoyed listening to this video please be kind enough to like it and you can comment any of your doubts and queries and we will reply them at the earliest do look out for more videos in our playlist and subscribe to edureka channel to learn more happy learning you
Info
Channel: edureka!
Views: 16,748
Rating: 4.9797468 out of 5
Keywords: yt:cc=on, analysing covid 19 trends using python, covid analysis using python, covid-19 latest prediction, covid-19 prediction with data science, exploratory data analysis, exploratory data analysis python, data analysis using python, data visualization using python, python data analysis and visualization, data science projects for beginners, data science python project, data science python for beginners, edureka python, python training, edureka training, Edureka
Id: 3ZacJ9zRVOU
Channel Id: undefined
Length: 64min 17sec (3857 seconds)
Published: Sun Apr 25 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.