Scraping fifa worldcup data using selenium and python

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
happy New Year internet daily code say again in this video we are going to use selenium to scrape FIFA site this site here to get data about FIFA World Cup matches this matches um that happened towards the end of last year and by the end of this video you should be able to use selenium to interact with the buttons on site like this buttons here um scroll web pages and script data so um buckle up and subscribe if you are not and let's get started so selenium is an open source tool which was primarily created by I don't know if I can pronounce his name well Json Huggins in 2004 to help him automate tests that are carried out on web browsers and until today selenium is used to automate testing Frameworks used to validate website applications across several browsers Firefox Chrome like the Google Chrome that I'm using here and other platforms which I have not mentioned you can use several programming languages with selenium and in our case here we're going to use Python because our channel is majorly Python and SQL and maybe some other data analysis tools so we're going to use Python so I suggest that as I'm coding record with me if you're learning so that you can build on your muscle memory yeah apart from all that yapping I think we should get started so what we want to do is to get the data from from here so um you should make sure that you have python installed in your computer I'm going to use by m as usual for all projects on this channel I'm using by m to manage my python versions but in your computer you should make sure that you have you have python installed and you have Pip because we're going to use pip to install packages for um for python so I'm going to I created this folder here it is on my desktop somewhere and in this folder it's YouTube contains projects to do with YouTube so I'm going to create a folder called FIFA then I'll move inside FIFA I'm working in terminal um this is format but you can use one prompt to create folders wherever you'll be working on um if it is on Windows or if you're using Linux I think you find commands for your terminal so I'm going to use by m then local I'm going to Define python 3.10.0 I know that there is python 3.11 but I have not yet installed it in my computer and I think it may disturb with the M1 chip um I haven't yet trade so we shall use python 3.10.0 so if I check uh what is on my local directory we already have that python version here then I'm going to create a virtual environment so python minus m the and creating a virtual environment called VM yep so we can still look at what is inside our folder now it's the end if you want to look at everything yeah we can see everything that we've just created then we are going to activate the virtual environment then we're going to install the packages that are going to be useful for this project if you install selenium I don't know how is it selenium selenium depends on which university school you from which other package am I going to need I'm going to need web driver manager I don't remember how Webdriver manager is spelled so I'm going to search it install [Music] we would drive manager is it having a yeah it has a minus between the mem drive and manager yep and let's install and see so let's launch vs code you can use any IDE of your choice or any code editor of your choice but for this project I'm using vs code and I would prefer you use vs code because it is um sorry supposed to cut this should launch yes code for me here then I'm going to create a file called script.pi thank you and then we are going to import the necessary packages so I'm going to import from selenium I'm going to import Webdriver yep and let me make this bigger so I'm going to import several packages that are useful for this project the web driver will help help provide all driver implementations so the driver was the one that is going to be interacting with the with the site that we send it to it's like it is like a bot that will be coming here and then it will click if you tell it to click um then it will maybe fetch data if we tell it to fetch data and so forth so that is what this web driver is going to be doing then we are going to import service so from selenium Steel dot Webdriver .com because I'm using Google Chrome but if you're using another browser that selenium supports then you'll be going that route so dot service you're going to import service so service will be will give us capability of running um other executions within the web web driver and for example if you want to have the plugin we will use service to run the plugin and majorly the service here will help us to to run Chrome driver manager so let's import chrome chrome driver manager from Webdriver manager because we installed this dot Chrome import Chrome driver manager so previously and if you look at uh videos that do web scraping with Google Chrome they point you to a driver manager that you have to download and then you link it to to your web driver when you're instantiating it but here we shall use this to cover all that so we'll need will not need to download any other driver and then we link it with with the with the Webdriver so that then we are going to have this package for settings when we want to provide extra settings to the driver so from selenium it's options yeah it's options so from selenium the trip driver dot Chrome dot options we're going to import options so options will help us provide extra settings for the functionality of the web for the for the functionality of the driver then we are going to import um a package that is going to help us to to Define how we are going to be fetching data so from selenium that web driver Dot common dot by we are going to import by so when when we are locating a resource from a site and there are ways of locating a resource for example if we are looking at trying to locate this resource would go here mottos developer tools then we point into this resource here and then we have a tag so we'll say look at this resource by this tag or we could use an expat expat will give us a path direct to this resource so this buy will help us Define how we want to locate Resource then when when we were interacting with a web page for example this one you send commands to um the browser you could send the command to the browser using a keyboard like if I type down arrow button on scrolling down up arrow button I'm scrolling up these commands are sent from keyboard app so we could want to use keys to interact with the web page so I'm going to import a package that is going to help us send keys to the driver to interact with the page from the keyboard so it's just a simulation not you doing it but it'll be a simulation so selenium dot Webdriver Dot com1 Dot case we're going to import kids then we will need to to make the driver stop at some moment because when it is interacting with the page let me show you for example here I'm going to open this in incognito mode foreign so you see it will call for this but this page takes long before um this part here for cookies is launched and we have to wait so we need to use this package here to wait so input time then pandas for data wrangling and maybe handling of data so import unless speeding yeah so these are the packages that we're going to be using for um scripting this site uh let's remind ourselves of the the goal for this video is to get the data and the data that we're going to get is this data this calls for all the matches of FIFA World Cup plus if you click here on this you are going to get other statistics about a match we want to get all that data and this is the link to the website so I'm going to say my website is that um just that link then uh I'm going to begin by putting the the settings of the driver so I'm going to instantiate my options so options equals to options um these options has capital O this one has lower case or just in case you feel like I'm confusing you also here these options has all lowercase all this has uppercase or this Buy has lock SB and this one has a pack SB this one as luck is K and this one has a package Escape so the first option that I'm going to set is to make sure that um the driver that does not automatically close the browser when when it finishes finishes execution you know let me first have my driver let me Define the driver here so driver one we're going to have I think two drivers so I'm going to call this one driver one is going to be Webdriver .com application and that is my browser my my driver I mean and now we need to to link it to a driver manager previously we could just put an executable path to a driver which you would have downloaded from a certain link but now we could just use Chrome driver manager to help us download and manage and install it so to do that we need to have service and the service will be now this service here and that service now will help us to to put in the Plugin or to install the Chrome driver manager so Chrome driver manager dot install so let's say it at work so we do driver one dot get let's enter get request and the get request we are sending will be website so we're just getting that link so you're going to get what I was meaning here let me run this so it is fast downloading the Chrome driver manager because we don't have it installed in our virtual environment it will download it and install it so what it's doing currently here is this part here it's installing the driver manager for you so it is launching aside from this other window and she's opening it then it has automatically closed it and that is what I was telling you that we need to send this setting to the driver so that it does not automatically close and that setting that we're going to send is going to be options so we already instantiated options here so options dots and experimental and experimental option then we're going to detach it detach so when it's running we can now we can detach from it and it will not automatically close so detach and the touch should be true so that it does not automatically close after this is after this has run so we can try it again I have my conductivity I'm just going to remove it for a second I'm going to deactivate this is not part of the yep so python it has launched on this side I don't know why it is launch on a separate window and that's fine oh it has automatically closed and the reason is because we have not passed the options here so options equals to chance also my vs code automatically saves the data saves my files so I don't need to click save every time you can set it so that it saves every time I move out of the a certain window I'll be dragging this window every time it launches yeah so this has launched let's see if it is going to automatically close plus it is telling us that Chrome is being controlled by automated test software so this is being controlled by selenium driver EV voila so you can see that it does not automatically closed and that is because we passed in this setting here to the driver inside the options so when we recap we can see the driver interacts with the Chrome it is the one that has helped us to Launch the this site launch this item clone then the service helps us to install driver manager if you watched another video you would have seen you'll have seen them linking to a driver manager that they downloaded from somewhere yep so now next what we want to do is to accept cookies so you see when it is launched here we can't do anything maybe scrolling we could but we could not do a lot of stuff here so the first thing we need to do is to tell the driver to click this button that I am okay with that and we're going to Define that as a function so we're going to do here um Define accept cookies function and what you should know is that this accept cookie function is tailored for only this site it cannot work for all the site because they have different ways of building the front end it's not a universal way of styling and building a front end so accept cookies and it should take in the driver so here I'm just going to say driver object so I can take any driver then uh just for people who are going to read your code in the future you need to look for them a comment so we say okay and [Music] process they accept so here it is not accept cookies I am okay with that there but things I am okay with that cookies but so we want when when we launched the browser and then call for this site we should wait for 15 seconds we're going to do time dot slip and we should sleep for 15 seconds for this video I'll be when I'm editing I'll be forwarding uh I'll be making this quick I'll be skipping this lip bit um but just know that it will be taking 15 seconds then we need to find the accept cookies button so I'm going to look for that element so what we're going to do is I'm going to have accept cookies button the element in the site button being equal to driver object this Brave object that we have passed here then the element itself is fine element then I'm going to find it by expat so by dots expat you're going to see what I mean by expat then I'm going to just say here a variable cookies button expats so we need to find that that um that path I'm going to say that then I put here so here we have Drive object dot find element by expat and they expect to cook is that so let's find the expat to that button here on this side so I'm going to let me make this big for rats then I got to mod tools developer tools so I I've clicked this button and I'm going to Point into I am okay with that button and this button here has an ID so what I'm going to do is to find button with this ID so I'm going to go to expat I'm hitting command F on my keyboard because it's smart I think you can use Ctrl F if you're not using Mac then I um this but here opens where you can now hit double slash then I'm seeing a button so I'm going to type button yep so the button should so here the button has the ID let me point at it again this is the ID I'm going to just click here and copy it so that's the expand so bye these are square brackets the add symbol is here and paste it so this is the expat too this button because it is showing us only one item you may type an expat like this and we find several items showing here this is one of one you could have one over 100 and that will mean that there are several elements that have the same path that you've typed here so but for us the accept cookies button has only one um what we have typed here button with this ID has only it's only one so I'm going to copy this and this is the the expert there so now that we have found the cookies button element we can now click it so we'll see accept cookies button that dot click and that is our function for accepting cookies I am going to move this function up I'm just going to cut it and then I paste it here so we start with the function that is for accepting cookies and then now after we launch the browser here we are going to accept cookies to accept cookies and the driver object is this driver one then let's run again our code and we see uh what it's going to do so now we are waiting for 15 seconds and we see if it is going to accept the cookies on its own see my Mouse is here let's wait and see yep so you have seen the the section for cookies has disappeared that means that it has accepted the cookies and now we are ready to interact with this um with this site next I want after it has accepted cookies it should scroll automatically down maybe three times so we're going to send commands from keyboard another an animation also an image sending commands to to the driver so that it can scroll down um I'm going to define a function and I'm going to call it animate scroll take this one down a little here so Define animate scroll it also takes a drive object and for our friends who will be interested in our code we live for them and not here so Scrolls the web page down so let me just use for scrolling down so that is enough and then I want to scroll down three times so let's scroll down once so I'm going to have driver object then I'm going to find the body because I want to scroll from the body not everything so we're going to say Drive object dot find element then by this time around we're not finding by expat but by tag name and the tag name should be body so after finding the body this one becomes the element for the body then we are going to send k so send keys not King's keys and the key we are going to send down is extend is page down so I'm going to say key is Dot Page down foreign after we have accepted cookies because it will not be useful before so after accepting cookies we scroll once done we hit we hit the page down key once so let me run this yeah we have seen that it has scrolled down once let's make this kind of fun I'm going to make it slip for a second then I make it scroll like four times so I'm going to put here Loop so for Dash in range for uh I should do this but before you do that you should slip so time dot sleep for a second so let's run this again before running it I just want to close this window close this quit that and quit that then let's run this again yep so the whole function Works um it has scrolled four times next we are going to try to script this site so what we're going to do let me open this okay so what we're going to do sorry let's go back we are going to instead of scripting this we're going to get links to where their statistics are because if you click for example on this card here also the link to to this site will be on in the description so that you can also interact with it they are interacting so remember when when you click on for example one of these cards here let's say France Australia the link changes and it takes you to France Australia page that has also statistics and it also has the score here which you can pick plus these are the statistics which we we want to pick we want to pick all the statistics so the first thing that we are going to do is to get for each of these cards here get that hyperlink so um let me show you the hyperlink that we want to get so when you go to developer tools oh this takes long because it is from the automated bit so let me repeat that let me pick a window here and then the link is here then I'm going to go to developer tools then I am going to Point into the card for this so you can see this card here um when you clicked on this card it was taking us to a separate page so we have to look for that hyperlink so I'm going to open this just showing us teams when you go when I go further up yeah so here's the the anchor tag and the hyperlink is this so what we want to do is to for each of these cards for example if I click on another card here it has its hyperlink up here so we're going to fetch all those hyperlinks for all the matches I don't remember how many matches they're there but um after fetching them we're going to count them and see if it is correct um what my theory is so the first thing is to since we are collecting several things we're going to um instantiate a list uh so we first accept cookies I don't want to scroll before because that is now a stage of time we can scroll maybe after we have gotten everything so the first thing that I'm going to do is to get all the div Elements which match a particular expert so I'm just going to call them anchor divs because they are these these uh I mean anchor elements because these elements are just anchor tags so uncle elements [Music] and the anchor elements give this a space will be driver one dot find elements so the difference between this find element this this this one and and this this is find element it is just finding one element and then this this is find elements so this is plural it's going to get more than one element if they exist and it will put them in an array or a list so we're finding these elements by expat and we are going to say that the expat here is going to be anchor expert anchor expert and let's define our anchor expert here so our anchor expert we're going to look for it here I'm going to hit that then command f and then I'm going to look for the div containing that expert so you see this div here so it is having this class so let me put here double smash then these and that div can be located I want a div that has a class and let's go back here search for that yeah so this is the anchor this is the the teeth elements I want to copy this class here so they're like 64. I remember there were 64 matches but I don't want that this element I want the the anchor tag inside it so I'm going to put slash yeah and now I have access to this anchor tag and here we are seeing it is one of 64. I remember there are 64 matches so I think we are right to have 64 anchor tags and it relates to all the 64 cards that are here up to the final so this is the Anka expert so I'm going to copy that and I'm going to put it here so now that we have the Anka XPath we're going to be to iterate through um all the anchor elements and we're going to extract the the hyperlinks so first of all we're going to have all hyperlinks should be equal to an empty list we just instantiating all hyperlinks then we're going to have for Anka anchor element in Anka elements so for each element in there we're going to say our hyperlink is going to be anchor element dot get attribute so we want to get an attribute from it so I don't get attribute and the attribute we're getting is H yper reference nature so this is going to be a link for one of them and we're going to append that to all hyperlinks so all hyperlinks dot append append it's opening up and I put a link then after getting all of them what we're going to do is to just print them to be sure that we've got in there so we're going to print all hyperlinks and I also need to know the account so I'm going to print the length of all hyperlinks then we're going to scroll and then after scrolling I want to quit that window so that uh I don't need to I so that I don't quit it every time using my mouse on my computer so it should automatically quit so driver dot quit so let's run this and we see what will happen foreign and I think I was right so we have all these hyperlinks to the matches which are printed here and we should expect 64 of them so scroll here so this 64 is because of this print statement and then this is the list that we have just printed here you could try to visit one of the one of the links so let me visit this this should take us to um particular football match yep so that link works so now when we have the all the links we can Loop through them and access these statistics because this is what we want and then we can begin scripting from there so the first thing we are going to do is to Loop through each hyperlink so after it has quit will come here I'm going to leave a comment let's say Loop through each hyperlink and script on statistics so the loop is going to look like this for thank you um for hyperlink in all hyperlinks we are going to define a second driver the same way we Define the first drive remember here we already quit the first drive so you can say driver 2. I'm calling it Driver 2 for the purposes so so that you can differentiate between which drive I'm using so Driver 2 will be defined the same way we Define the first driver here then we say driver 2. dot get and now we are not getting website but we are getting hyperlink to each link and there we are sure that we can launch the the the the website through that link then the next thing we're going to do is to accept cookies so exit cookies and drive object is driver 2. then you see when I was interacting with this I first clicked on stats um Let me let me copy this again from terminal here copy then when I paste it here didn't copy so let's copy this link copy paste so after reaching here and accepting cookies the first thing that I did was to click on statistics or stats so that we need to simulate that so that when the driver comes here it first clicks on stats and to do that we need to get the stats these buttons here expat so I'm going to go to developer tools then I come here to 0.22 stats and the this is the expat to stats so it is a death of this class I'm just going to copy this class the whole of class definition so like all classes for that div then away it's here I'm going to command F then double slash then I'm looking for a div was Plus at and paste the class and I only have one of one so this is the best expert for it then I'm going to define the expat for stats so I'm going to call it stats because it is a tab start stamp expat being equal to that and then we are going to get the element itself so that's tab and that element is going to be found by driver 2. so driver two dots find element so we're using only element the singular one then since we're using XPath we're going to say by expat and that is the expert to that so since we have found it what we wanted to do is to click it so let's start stop dot click so when it comes to this site the first thing it will do is to it comes like this which is here and then it clicks on this button so that it activates um that information under it then um after that I wanted to animate scroll then quit for now so that I can see that it clicks it so I'm going to be to do animate scroll we already wrote this function driver tool then driver to dot quit dot quit I'm not going to run it for all the paper links I think I should run it for the first three so yep that so it will launch like four times the first time it will launch it will be getting the website this one then it gets all the hyperlinks and then it puts them in all hyperlinks then after doing that we look through each hyperlink we launch the browser and then after launching the browser we click on start stop then we scroll so let me run this I don't want it to to print a lot of things on my terminal so I'm going to remove this print statements then let's run it and let's wait and see what it's going to do foreign you cannot locate accept button unlock um that is line 48 so accept cookies and acceptable case let's try to sleep for 20. 20 seconds let's make that half a minute sleep I think that is enough time then let's run this I am going I'm going to stop this because it's it's scrolling before um before it clicks so I'm going to first stop this um which is then quit this oh yep I think I have found where the problem is let's go back to developer tools then let's point at this I think instead of using that expat we could use the the expat for the container so the container for all the the tubs the subtitles so if we use that because it seems It's failing to get that because when a tab is actually activated the uh the classes changes there are some classes that are activated and then there are some that are deactivated so what we're going to do is to get it the the the container div for these tabs so I'm going to have the teeth um with the class so at that that's the class I just copped the class so inside it they are several other divs so I'm going to put slash slash then I want slash div they're going to be um four of them so using one of seven not four there are seven this is one two three four five six seven and then I want to subset it and get only the the fourth one which is stats so let's change a little path to that is going to be clear when I paste it here um so what happened is I looked for a div which has this class here and this div here is a div that contains all the other tubs the the titles for the tabs then after that I move the step to other divs and it brought for us seven deals and I chose fourth so with python the numbering begins from zero but with the selenium the numbering begins from one so here we call this as one two three four that's why I subset it here as four so let's try again and to see if it is going to click then scroll because when it Scrolls we expect it to show us the statistics there has been an error oh it seems I'm trying to click fast but I think time for sleep is that is 30 seconds is enough let me try to run this again or I was trying to click something else let me try to see uh the one trust accept button Handler could not find them so let's try again running let me just close everything with that with that also quit this one then let's run quit this Ctrl C let's run it again and let's see what's going to happen it seems that if your Internet is slow you'll need to give it a lot of time for the page to load fully it could be a whole minute my internet has been slow today so I think that's why I've given it 30 seconds if you have a fast internet then you could use even just five seconds yeah it has now clicked so we expect it to fetch all the hyperlinks then scroll down it has done that then it will quit then now it will start with the first hyperlink which is a game for Ecuador I think yeah then it will wait for the cookies part to load then it will accept the cookies after accepting cookies yeah you should click on the start then scroll down so you've seen if you rewind a bit if you go back a bit you'll see that it moves from to start then it will scroll down which is the behavior that we were expecting so let's see the other one the next game it should first accept cookies then second um click on start button then third scroll so I think we are now confident that this is working the way we want so the next thing let me quit this and Ctrl C here Ctrl C so the next thing is to let me go here the next thing is to to start scripting this so we're going to script position total considered inside penalty outside penalty all this stuff then put that in a CSV somewhere plus the score so we're going to write a function that we are going to be using to script this data or extract this data so I'm going to call it extract data I'm going to write it on top here so I'm going to Define a function called extract extract data X extract data and it still also takes a drive object and it returns I think um list or a tuple of listed thing so the first thing to do is to get general information for example the kind of match the kind of match I mean is it a group match is it uh uh a semi-final or quarter final or is it a final then we are going to get the date and time when that took place this one here then we're going to get the first team so the first team is Senegal I'm reading from left to right so my first one will be Senegal then the second one will be Netherlands then the the scores we're going to get these scores here then we are going to split this data just doing some cleaning and so forth so let's start so the first thing we're going to do is to get the match kind so they kind of match so much kind will be equal to that driver object driver object Dot find the element and we're going to find element by expat and the expat that we're getting here is so let's let's try to to figure out this out I think by now you you know watch the expat for these elements are but let's get this then the rest will be copying from the notes that I have on the side so developer tools then I'll click here and they come to the word Senegal now uh what we're looking at first what you're looking for is a much kind so come here to group a and I'll be looking for this paragraph here so so when I look for this paragraph I hope I can get that so I'm going to hit first of all I'll copy this because it will get lost when I when I start writing the expert then Ctrl f for command F then double slash I'm looking for a paragraph with the plus so at that's the class so we here um this and you can see this is showing group a and it has only one item so I'm going to copy this then I'm going to paste here and that is the match kind so we can print much kind so you can see it on the terminal then we're going to use this function within the loop so the loop down here after you click the start then we're going to extract data and the driver object is going to be driver 2. so let's run this yeah let's run this and see the output we are expecting like three outputs on the terminal and the outputs will be the names of the group of the matches I mean the kind of matches oh and one thing that I have forgotten to put here where are we extract data with it extracts extract data so this will return an element what I want is to return text so do dot text so let me first quit this then let me run it again and let's wait for the output yep the approach is coming so the output launch decided to launch in a separate window but it's fine seems it's going to be launching there now I think I should bring something like this on this window here might have confused it let me close it I'm going to Ctrl C here then close this because this I quit this then let's run it again hopefully it will be launching on the primary yep so if we try to look at this we have seen that it has picked group a and that is how you use XPath for for locating elements I'm not going to now I'm going to where is scroll here so now I'm going to write the same kind of code like like this one of much kind to locate several elements um finding XPath for each of the elements is going to be taking long so I'll be copying the X path from the notes that I have on the side but it is the same way just go to developer tools and get them so the next thing you're going to do is to get the date and time not the date time so I'll put that time and the daytime is still the same driver dot find element then the text and the expat for the time is this I'm copying it from assigned so that we can save time but you'll find the link to the code in the description in case you want to look at the expat and or you can try to find the expert yourself make sure you find the expat to this element here so the next thing we're going to do um after getting the time we're going to replace there is a weird text that is there this dot that you're seeing here I'm going to replace it with a minus it is a hanging my floating Dot so we're going to do that replace and we're going to replace it with this weird Big Dot we'll replace it with a minus sign and that is how a date is going to look like then we are going to so I'm going to cut the rest of the of the item then explain the items that I have got it and how I'm fetching them so here we have all the elements that we are trying to pick so we can look at them one by one so we are getting the first team the first team here is Qatar and that is by the same thing driver uh dot find element by XPath and this is the expat to the first team if it is a long thing it is just a class all the classes at once that I've picked so it is also a paragraph so I'll just pick the paragraph element the same as the second team the scores these are the scores here and I'm also getting it by XPath and this is the expat but there's something I'm doing with this call somewhere and down so the first thing yeah first team score here I splitted by the minus and I pick the first value so it's just normal string operation so the scores that I get here um then text and then we split by minus and then fix the first value and then remove the trailing on the beginning spaces the same as the team score so here these are just general information much behind the time first team second team scores first team score second team score then position so when we scroll down after clicking starts when we scroll down we have your position position stood on its own so we have these are the elements they are they are kind of similar in terms of the way the UI is built but position is different so here you will see that I first extracted team position so the first team position still it is fine element by X path and then next part is this and then remember the way we did um getting elements in a div that has several elements then we substitute and pick the first one here and pick the second one in the same way uh we have done to pay to do team position and second team position then other matrices they had the same expat so we just got find elements and then it put them in the list and then this is just a primary key for the general information just getting the first team name where are we uh getting the primary keys somewhere here so this is much kind the kind of much group a match or semi-final or something then first team the second team and then this is just cleaning replace it and then replace then lower than the much kind the first team then this is the general information primary key and stuff so here we have the primary key which we Define the much kind we know the date the first team the second team the team score first team score second team score first team position second team's position so this is just uh the general information and uh yep so that's the general information then for statistical information so we have here match statistics so what we did is getting these div elements here for the rest of the elements remember we come position is different but when we go to Total considered inside the penalty outside the penalty all these matrices have the same kind of styling so we could find all of them at once using this expat here um the link to the code is in the description this expat is long and it could take a lot of time or if I was to describe everything for you but it's the same way get all the elements put them in an array we can now start getting element by element and that's why we are moving here using this Loop here so metric div and then for each metric D we find now that metric itself and make it easily text so the metric itself will be here is total here is On Target he is of Target those are that is the metric itself then the value for the first team so here if we look the value for the first team is five and then for total goal attempts the value for the first team is five and the value for the second team is six so here we have value for the first team this one and then value for the second term this is the expert um they kind of look alike but if you look at here it is Mr and then it is ml yeah so then we make the primary key for the statistics and then we put it in a list and then we return that information so this is where we are now we have built a function for extracting where is it extracting data so now we can now write code that will use it and extract data for for us so there we are going to go down to the loop here where we say extract data so I'm going to Define two things on top here I'm going to Define all matches general info so like general information we're going to define a list that will be holding this general information and then we're also going to Define list that will be holding there is much statistics okay so we have here that Loop so you say all matches General info is an empty list then all my chance all matches statistics is also going to be an empty list then the in the loop we are going to to append information that comes from extract data it's so the first thing we're going to do is General info and match statistics they're all coming from the extract data remember extract data returns general info and much statistics much statistics is a list of lists because we Define the much statistics here and then we are pending other lists from it and each list has a primary key so the match primary key and then the the primary key for the general information because the general information is a match for example first team versus second team and each match has several of much statistics that's why we have its primary key for the statistics but though we also have the foreign key that relates to the match itself so that we can um if we are to analyze we can join this this table here this will be its own table and general information will also be of its own table and we can easily join them and then do analysis so yeah so this to list this is a list of lists and there's just a list that is returned and that is what we're going to be using here so after doing that we're going to say all matches general information we just append that list so we say all matches general information dot append and we are painting General info but for the statistics so all matches statistics we are not happening we're just adding that information because it is a list of lists so it will be the previous plots the match statistics and then we can animate scroll then we quit then after quitting what else can we do so after moving out of that Loop uh we are going to create a data data frames for this data so the the data frame will be having columns for example for matches which are calling general info these are going to be the columns and these columns is just ID which is the primary key for the general information which we put here up here up here this peak here then the match kind is he at the time first team second team and so forth then for the statistics table you're going to have their columns defined as this I'm just copying them because it will take long yeah so this is the ID the match ID so much ID is the the ID that relates to the matches here this is the ID for the statistics and then the metric itself and then the first value and then the second team values so let's define the matches here my chest data frame is going to be we already imported pandas so PD Dot um dot data frame and the driver will be our matches statistics and the columns the column titles will be no this is uh this is not all much statistics it is this online change general info and then the columns will be these columns here the same for uh stats DF so we'll just copy this then this will be the columns and the statistics using all matches stats then after that we are going to save the data in a data frame so we're going to do matches data frame I mean we're going to save it in the CSV and so this dot two CSV and then the path to that CSV is going to be data slash matches dot CSV we don't want the data to have um to have indexes so we're going to have index equals to false the same thing for statistics so this is touch.csv and this is going to help us to to scrape all the data so what we're going to do now is to run this and see what's going to happen um before we run let me create a folder here called Data so that all the data will be there so let's run this is going to take approximately 40 minutes because I ran it before and you can find the code through the link in the description thank you for watching I'm going to after running this I'm going to show you the data foreign so after a very long time this has completed running and we have two CSV created here if we open we can see this is all the matches and this is all the statistics yep so this video is complete um please leave a like And subscribe if you've learned something new until next time take care and have peace
Info
Channel: Dali Codes
Views: 61
Rating: undefined out of 5
Keywords: data science, data engineering, software engineering, machine learning, deep learning, natural language processing, game theory, css, javascript, python, nodejs, vuejs, reactjs, django, hadoop, django rest framework, flask
Id: -OlpkBKJWig
Channel Id: undefined
Length: 84min 18sec (5058 seconds)
Published: Tue Jan 03 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.