Oracle and Red Bull Racing Honda Machine Learning Hands-on Lab: October 21, 2021

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] hi everyone and welcome to this exciting handsome lab my name is victor martin technology product strategy manager and we are very excited about this workshop based on the collaboration between oracle cloud and red bull racing honda during the next two hours you will build a machine learning model and integrate it in a python application running on oci to predict the top five drivers of a grantbridge this is a machine learning entry level workshop so we don't expect you to know much about machine learning the whole content takes more time to execute than the two hours we have today and that is why we provide all the notebooks and they are all executed for you i will walk you through the code of the jupiter notebooks but feel free to follow along with me the most important takeaway for this workshop is that you understand the process of implementing this machine learning model and to put it in production to take even more pressure of you our panelists will give uh get you out of trouble uh through the slack channel as etory mentioned uh first i'd like to introduce some folks that are helping to in today's workshop as i mentioned i will be your guide today we will also have other panelists that are going to answer questions for you behind the scenes etore is head of developer marketing foreign melanie is master principal sales consultant on the oracle solution center pristila vella is technology product strategy director john craig is technology strategy program manager paola and rox and rawan are a technology product strategy manager roxanna and javier are senior data specialists uh cloud engineers on the data management uh peort is the main specialist cloud engineer data management and tim graves oci specialist director thank you all for your support today before we start here we have the resources needed for today's session check now as well on the zoom chat section as i tore mentioned to grab the links but you can also type them on your on your browser directly before today's session we send you an email with a request to fulfill some prerequisites including a video with a sign up process for the oracle cloud account i will start a poll now to see if you have completed the prerequisites if you have done that that's great thank you you are already ahead of the game if not you have to do it right now so let us know in the poll so we can allocate more time for you on the pool we are checking how many people have already completed so yeah pretty good indeed um on the screen you can see the the busily link for the sign up for the oracle cloud account if you still have not uh sign up for a record cloud the second one is the link to the workshop guide step by step and finally the link to the slack channel for q a please join join us now in that slack channel and say hello let us know your country as well if you want and that slack channel will be the place to ask for help so if you join now it will be way better than when you have a question and you are like rushing through the content for those that haven't signed up for the oracle cloud account please follow follow the first bit link and sign up now we have excluded your email so using the email that you have for the registration and you can avoid the a few verification steps during the sign up process we will also give you 500 euros instead of the traditional 300 euros to consume during the 30 days trial period you will not need a credit card for the verification of your account if you follow those steps in case you need it for any reason you will not be charged unless you explicitly tell oracle cloud to upgrade to pay as you uh to pay the account so you're safe for those of you that have already completed the prerequisites thank you so much bear with me for a few seconds i mean while i give you a brief overview of the workshop here is the agenda for today let's jump directly on an overview of what we are going to see so this is the overview of what we are going to do today basically we are going to follow the the data pipeline typical for machine learning so we are going to data collection to do data collection meaning that we are going to extract information from formula1.com from aircast api that provides information about races in the past and wikipedia for weather on those days on those specific circuits then we will step into the number two data preparation where we will start merging all of that data in one single data frame and also clean that information a little bit the next step will be the data exploration meaning searching insights on that data start plotting some graphics so we can understand how the data is lie out the fourth step is implement and develop machine learnings different machine learning algorithms and evaluate which one is the best in the step number five we will take the best algorithm and deploy it and expose it through a rest api and the final step we have give you a web application that you can deploy and then use that rest api with the model to have the predictions by the end of this workshop you will deploy this python application that allows you to select the circuit and the weather conditions the machine learning model will give you the predictions of the top five drivers in those conditions we are going to walk you through these notebooks so don't worry for the different laps so let's get started the first item for the agenda is the provisioning is very important because at that point you can have access to the cloud and then start working with the rest of the content so let's watch a quick video that walk you through the steps of the sign up process for those of you that haven't have a cloud account already if you have if you want to watch the video great if not no pressure go straight to the workshop guide to find the documentation covering how to generate an ssh key and the stack that provisions the environment that will be the first step everyone needs to those to do those steps to have the notebooks and the python application so feel free to start if you have your oracle cloud account fully provisioned hi i'm victor martin from oracle cloud i'm going to walk you through the steps to sign up for an oracle cloud free tier please follow along and pause the video when you need it oracle cloud free tier gives you a set of always free cloud services these services will be available forever two autonomous databases two virtual machines up to four arm processors virtual machines you will also have a load balancer block object and archive storage monitoring and much more as part of your oracle cloud free tier you will also have a 30 days trial with free credits to use in other services from analytics to kubernetes sounds good isn't it are you signing up for an upcoming workshop with us please make sure that you use the same email you use for the registration as we have activated a special offer and you won't need your credit card to sign up in any case oracle cloud is not going to charge you for your trial you actually have to explicitly tell us so when you are happy with all the oracle cloud services after your trial let's dive in you need to add your country then your first and last name you also have to provide your email oracle cloud has to check you are human by asking you to resolve an easy challenge finally click verify my email to move to the next step if your email has been activated for the offer you will see the special oracle authored pop-up window with the type of offer already selected then click select offer all the initial information is already in now you have to provide a strong password i use a generated one but you can select any you like just make sure you are in compliance with the strong password pattern confirm the password by typing it again fill the company name if apply the cloud account name is essential as it is a unique identifier for your account select a name you like that is not taken the home region is the region where you are more likely to deploy resources you can deploy to other regions by subscribing to them however your free trial only supports one region select a region geographically close to you if you are not sure i'm selecting frankfurt region at the bottom click continue fill the address information with city postcode county and your phone number click continue the final step is to accept the agreement and click start my free trial oracle cloud starts provisioning your account this takes on average one minute or two then you are going to be redirected to the oracle cloud web console can you see the message your account is currently being set up it means that some services will still be unavailable for few more minutes just keep it in mind when oracle cloud fully provisions your account you will receive an email i will show you that in a few seconds go to the menu to explore oracle cloud services also the search bar is a friendly tool as you can search for compute instances kubernetes engine analytics services cost tools user management autonomous databases apex instances and much more on the right side you will see the region you selected when you sign up the next icon is cloudcell a small free linux instance with several tools pre-installed and configured like oci cli and languages like java node python and more you can close cloud cell now confirm with exit the next one is announcements for updates from morocco cloud the following is the help icon where you can find documentation links chat with us post questions in our forums and finally create a support request and request a limit increase the next icon is to change the language and the final one is the profile icon where you can see your user information your tenancy and the sign out button after a while the message saying that your account is being set up will disappear after a few minutes you will also receive a getting started email and another one when your trial account is fully provisioned use the button in the email to sign in whenever you want in the future you are all set up to enjoy your new oracle cloud account happy hacking [Music] perfect let's jump into the generate ssh key that is the first step i have to say it's optional but it's a good practice to know how to do this so i have to share the browser so you will see how i'm doing that this is the the content of the of the lab so basically you can open the different steps and go through them i just want to copy one specific one that i i'm going to to need and i don't want to write on my own so i copy that and come back to the oracle cloud console everything was explained on the video so i will jump directly into the cloud cell and this is where you are going to generate the ssh it's a linux environment so it's a nice way to not have to install things on your computer you can use the tools here so i'm going to execute that it's going to ask for a passphrase passphrase but basically i want to leave this empty so empty and confirm that i want it empty and then i can list and see that i have two files one that is the private key cloud cell key and another file that is the public key public cloud cell key dot pop that is the one we want to list to to show the content and that is the one we are going to use in the next step so we can print that on the screen and i can do that because he's the public one so there is no risk on security and i just copy exactly all the text that i need so let me go there be extremely careful with this because you can copy a little bit more of a character and then like those empty characters that you don't see and you can get into trouble but just make sure that you copy that text sorry to interrupt you can you zoom in a bit uh and absolutely yeah the front side is really small yeah uh so thank you what i'm doing yeah thank you sorry uh jump uh at any point um and tell me how the q a is going as well but the cut is to print the content of the file so this is the content i copy that content and that is everything you have to do for that first step on the generate ssh key so in this case we can come back to the presentation and in this case we can go to the next step you can do it on your own don't worry um and then we can jump into the next step of the agenda that is the greatest stack using resource manager that is basically a terraform package with all the information all the environment for today workshop is is that so let's jump into that let's come back to the browser and here in the browser we can just take a look to how it looks like it's lab number one create a stack using resource manager i click there and then the first step and very important is this button the deploy to oracle cloud that is the one that is going to kick off another tab on your browser and it's going to redirect you to the wither to create all of this i have already all of this running but i want just to walk you through the steps so the first step and once again everything is on the lab content so you can go through that read or just follow along my explanations it's up to you as well i want to remind you to join on the slack channel okay so i will um i will um [Music] tell you to go to the to the chat section and make sure that you join the workspace and you join the slack channel specific channel for the for for the session in case you have any question moving forward so the first step you are here in the stack information to create a stack and we are going to accept the terms of use and then everything will be populated for you you really don't have to change anything the root compartment it's okay it's the way to go the root compartment will be created for you by default so you don't have to do absolutely anything and then you can go next configure variables and here is where you have to paste the ssh key so you can paste it and make sure that there is no strange characters at the beginning or at the end and then you can go next to review and confirm with create make sure ground apply is already selected because that will apply the terraform script and just create all the environments for you you don't have to to do anything you click create and happy days i'm going to show you how this looks like so you will end up in this stack that i have here and the apply will be working so it will be in provisioning at some point in time and this takes around i would say 10 minutes and you will you will have the succeeded for the apply job and that means that you have the instance up and running if you have any problem at this point it's usually because you are using or reducing another trial account that is not active that you have a pay-as-you-go account that has no enough resources available things like that so in that case let us know in the q a but i recommend you to go for a fresh free account on oracle cloud and that will minimize and streamline your process you can if you have any issues you can copy the error message that you have in the logs and that is pretty much everything you need for the for the stacks the next step on this lab is to go to compute so for that we go to the menu compute instances and once again everything is explained step by step on the guide but basically at this point you won't see an image ah an instance and that is because you are in the root compartment so make sure you go here open it and check the red bull hole if it's not there refresh the browser sometimes take a while to refresh those refresh the browser and you will have it there as soon as you click the red bull hall compartment the instance will be there and then we will have here the public id you will have a different one so use your public id i'm going to open another tab paste the public id that i have but yours will be different and then you have to write the board so semicolon8001 that is where our jupiter notebook is listening they will ask for passwords so let's put this a little bit bigger and as well in the documentation you will find the password but the password is red bull 1 with the arc in uppercase save and then you will land into the dashboard of the jupiter notebooks as you can see all the all the notebooks that you need for today are already there and i want to call out the attention in a couple of folders one is from scratch so i don't recommend to run any of those because some of those take a little bit of time so we have run them for you but for the specific ones that you can run and test it they are in front scratch so if you want to test it from scratch you can open them and and make sure that everything works for you and there is another folder that i will mention nearly at the end that is the rest that is the results because there are some parts that are left empty and you can you have to feel that if you get the stack for whatever reason the solutions are there so feel free um and that that is all for for the stack um this takes uh 10 minutes so leave it working when we continue working on the rest of the content i'm going to come back to the presentation and go to the next step that basically will give you five minutes to generate the ssh key and kick off the great decoration of the stack leave it provisioning as i mentioned it might take few minutes post your questions on the slack channel and see you in 5 minutes [Music] [Music] [Music] [Music] [Music] [Music] [Music] hmm let's see how all of you are doing uh let's launch a poll to check if you have completed the generation of the ssh key and you kicked off the creation of the stack i know this could be like a fast-paced but as soon as you have the stack already up and running that means that you will have all the content all the code and you can just do all the work on your own time the most important part and key takeaway for this um this event is that you understand how to implement it how to everything layout and then you can take a look to the details on the code so if you can launch the poll for for this state so please answer and tell us if you have a completed lab one great stack even if you if if the stack hasn't finished that's okay it will take few minutes but as long as you are moving in that case and the team is going to help you on the on the slack channel that's okay you can move forward with me on the rest of the content so we are going to jump into the data collection that basically is going to be an important step to just get information from the different places i'm going to show you how to do it and then you can do it on your own and if you are moving ahead with me at the same pace that that's good then you can execute the code but feel free to go either way cool we have a 50 50 what is fine let's jump into the data collection you can leave it the stack you can leave it up and running that is not going to affect and you can just pay attention to the next steps of the data collection let me jump into the browser once again and i'm going to show you how the data correction is so we cover the onboarding for the notebooks uh where all the folders are and so on and so forth uh what i want to show you is that if you go to from scratch when you have the stack you have login and so on you will see that there are uh the zero one zero zero one one and let me put this a little bit bigger so you can follow along zero one two those are the three that belongs to this data collection and we are going to open one by one and go through the code so in this case uh this is a python with several libraries that we are going to import and something that i want to explain if you are new to jupiter notebooks you can run a cell by cell so these are cells that you can execute you can do it with the play button one by one and it's going to execute one by one but you can also execute the whole notebook with this one this is going to restart the kernel that basically is clean all the information and start from the scratch and execute the whole content if you do that you will see that python 3 ielts will move to python 3 bc and that is the indication that things are being executed just in case there is no feedback or looks like what is going on but that is a nice way to check if if the notebook is being executed i'm saying that because some of the steps takes takes a little bit of time i remember that we are doing the data collection so we have to wait for the request to come back from the different services that we are pulling the information from so let's cover a little bit of what we are doing basically we are using the earth gas api that basically is a free service that has information about formula one traces qualifying drivers so on and basically if you take a look here we are going to ask for all of this information and we are going to do it in this url so that url contains the information of the different things that we want and we are going to do it by year so that's why we create a range from 1950 to 2021 and we are going to pull all of this and inject that value into this part and that is going to give us information for that specific year we are going to do all the mapping getting the information from the seasons the rounds the circuits so on so forth and finally at some point we are going to create a pandas data frame that will be our data set for that specific races and we are going to start playing around with the information so we are going to print some of the information so we can see how the information looks like that is already part of understanding your data and then we are going to store that information in files as a temporal storage for the for that information so races will be in data races csv file and then we will jump into more uh to drill down into different details of the data of a data that we are getting so we are going to get the same for the rounds the results of that will be different you see we change the url other than that is the same thing we are going to go through the rounds and get that extra information drilling down into the details of the information more mapping just getting the information from the response inside of our uh data frame and then we are going to create once again the data frame from pandas no big deal and we are going to continue doing that for drivers standpins you see the structure of the code the url with a different url and then the mapping and then storing with panda the information as a data frame no big deal we are going to start working on that information printing it understanding how it is and then store it as a temporal file in a csv file we are going to do the same with the teams nothing new so that's why you don't really need to wait for the stack to complete you can understand the code and then you can do it on your own time we are going to do the same for constructors trying to find information that we are interested and then we are going to just bring the time that we took for me to execute this all good and with that we will get all the information from our cast that means all the information from the air races the years and the different rounds that happen on that specific race for the weather that is another file that you have there in front of scratch that you can run if you want but if not it's fine we are going to follow the same structure we are going to do the imports for the different libraries just to call out the attention we are going to use a new one that is selenium selenium is a way to web crawl a website that basically is going to the website and start extracting information we are going to use html and css tags so basically we are going to find different pieces of information that is on on the website in this case is the weather so we are going to jump into wikipedia we are going to ask wikipedia what was the weather in that specific area and in that case we are going to have this type of information uh rainy warm dry sunny so on we are going to print the the end the tail of the information just to see how it looks like at the end of the data frame and then we are going to start making a little bit of a mapping because some of the information is in different languages with different expressions that means the same so basically what we are going to do is to translate that in a table that is the one we print here whether a data frame and we are going to print the beginning and it's basically a table when it says zero or one two or false if the weather was warm cold dry wet and cloudy that is the most important part different weather conditions that affect the temperature of the of the circuit the temperature of the wheels the type of wheels that you are going to put the tires and depending on that the car will perform in different ways so we can continue just inspecting the information we are going to contact that information so we have everything in one single data frame and at some point we are going to do exactly the same as we do store that information in a csv file as a temporary storage for for us to recover in the next stages we know how to run it we have that the information of the the formula one and also the weather we are going to get the information from the qualifying and the qualifying is information that we extract from the formula one website nothing new here we are going to have a url from formula one getting the results and we are going to extract that information into the data frame and us and at some point we are going to start printing that information getting information from the different drivers and cars and the timing and the session and the round with that information we can start uh getting into the nitty-gritty of the of the hanson lab and no surprise we store the information in a temporary csv file qualifying csv file and we are done with this step so basically it's very simple we are just pulling the information from internet getting some of the information we are going to analyze and we can just move back to the presentation for the next steps um the next step on the agenda is the data preparation so i thought let me know if there is any question outstanding uh question on the q a that you want me to address live and this is a nice pause in case you are following along and you want to continue not at the moment uh victor so there are uh guests that are rising question in this lecture in the dedicated slack channel that invite i invite you to join all of you uh also because the slack channel will be there also after the end of the session so you can be in touch with our expert with victor also in the upcoming days so don't uh don't hesitate and join that slack channel thank you victor i will definitely join uh after i finish the the speaker part and i will answer any particular outstanding questions so feel free that is an open channel for you as i thought i mentioned so feel free let's jump into the data preparation we were talking about data collection so getting all the information and now it's time for the data preparation that basically is self-explanatory anyway but in this case what we are going to do is just to understand those files that we have with that information and start aggregating the information so let's come back to the beginners and we have to go to the co2 data preparation let me open that one and i'm going to close the ones we already work on data preparation is about merging all of those files that we have and start cleaning that information making it easy for consumption for the next downstream um a workflow part so let's understand we are going to read csv files csv files that we already pulled from the data collection part traces results qualifying etc and then we are going to start uh printing how that information looks like so we understand what type of variables the size of that how many data points we have and at some point in time when we we are okay we are going to rename some of the columns um to friendly names we are going to drop some columns that we really don't need we are not going to use and then print the information feel familiar with information we want and then at some point this one is the important one we are going to merge the information so for those of you that have some knowledge on relational databases what we are doing here is a simple merge between two tables races and weather for example in the first row and is an inner join on those specific key values the season round and circuit and then we are going to do the same with what we have from before df1 with results with an inner join and we are going to continue doing that merging results driver standings constructor standings and qualifying that means the the final df the final data frame will contain absolutely all the information we need we are going to print it nothing new there and then we are going to start making some transformations the transformations that we are doing is basically calculating the age of the driver and making conversions of the data type data times um at some point in time we are going to drop as well the nulls information that will it doesn't give us any any information and make sure that it's nice for consumption venus in the next steps convert booleans the typical cleaning of the data that we have to do in this data preparation stage at some point in time something interesting for example that we are doing is to work with the nationalities the constructors the circuit a3 dropping some columns that we don't need so on and then at some point we are going to store the final version of the information with the data frame in a csv file that is the way we just leave that ready for the next step and nothing new once again we are going to print how long it took to run this and that is all for data preparation so we can come back to the presentation mode and here in the presentation we can move to the next step in the agenda so once again i want to really mention this a few times this is about you learning understanding the code then the code is there in your free trial account you have 30 days to spend time with the code change the data set change the information reduce the code do whatever you want change the the way the python code is written so it's very important for you to just follow with me understand how is layout and then you can do whatever you want with that information data exploration this is the next step so let's come back to the browser and with this step let me get the guide that i have in data exploration we are going to open a new file and that new file let me get there okay this is when you don't see the mouse and you have to guess what you are so let me open the um basically the exploratory data analysis that is basically the the data exploration part and here i want to explain a little bit about um about the information that we have on the a formula one so if you are new to formula one uh basically there is always uh unfortunately there is always a dominance in a constructor and a driver and there is a little bit of background information you need to understand the data set basically there was a period where michael schmucker alonso bettel or hamilton are the dominic drivers basically there was a big change in 2013 where we changed from the beast roaring b8 engine to the hybrid era so we moved to a v6 engine and that is the power unit that is composed for the electric part and the combustion engine and that is from 2013 what changed a little bit but basically everything that happens from 2010 i would say to now is kind of more or less stable in terms of results how uh the qualifying was done so so now that we understand a little bit about the background of the information it's time to jump into the code and understand that we are just doing the same with it this is just briefly doing the same step that before so you can understand this read the csv files merging the information and then printing the information nothing new there and we are going to do a small cleanup uh this is all the columns we have after all the merging that is too much we are not going to use all of that information so one of the cleanups to further drill down into the information that is really relevant is to drop those columns and then if we print the columns after the drop we will see that we have only the information we need what is way more interesting and then i want to cover as well some information about the grand prix structure so this is the weekend of the race and what is happening is that we have three days friday saturday and sunday sunday is the day of the day of the race that is what we are trying to predict who is the top five drivers on that sunday morning and then we have the friday that is like a free and free session where they can just tune the car set up the different the different configurations of the car for the circuit and then on saturday we have three seasons three sessions that are c1 q2 and q3 all drivers join q1 they go for the best time and the five slowest they are eliminated the rest are moved into q2 q2 will have less drivers and they will go for once again the best time lap the five drivers again that are slower they will be eliminated and they will be in that position for the for the sunday race and then the rest will move into q3 where we have 10 drivers the 10 fastest and they are going to once again go for the best and fastest race a lap time and basically that will compound the the grid and that is the position of the cars for the sunday session that is diverse so if you understand that that is what we are analyzing when we are going through the data that contains the qualifying information and after that we are doing further cleaning and we are considering only data points from 2010 because is the period where is more relevant and is kind of the same type of formula one in the last 20 10 11 years there is a big change coming next year so 2022 will be completely different for those of you that don't know about formula one the big change is that because there is that dominance and usually there is not a lot of action sometimes on the race uh because the the air that the car in front of you generate is is creating a lot of turbulence and making it really difficult to to go behind a car and take uh overtake so what what the changes are apart from the the visual ones that is the the wheel side it's going to change from 13 inches to 18 more like a typical car that we have these days the biggest is the aerodynamics so come back to the the ground effect and and that means that we will have less wind and that means less turbulent air moving forward so the analysis that we are doing today is going to apply only until this year after that there will be other changes we are working on that so there will be other advanced machine learning workshops going through different different algorithms so we are cleaning the data something interesting here in this part of the screen is that we are modifying the names of all constructor teams for example in the past we have force india raising point that now is aston martin we have the old sour team that now is alfa romeo and we have renault and lotus that join into the alpine f1 and toro rosso that now is alpha theory so those are just cleaning the information depending on the type of information you are analyzing you will have a different results but basically you are just cleaning all of that information so it's consistent is the same across the across the the data frame and now we are going to jump into some other information uh it's really small for you you probably have to look into into that but we are going to um understand the concept of dnf so dnf is did not finish those drivers that for whatever reason either the constructor the car fail for a reason or the driver had an issue and failed we are going to translate that into a did not finished a concept that is going to allow us to understand if the failure was because a driver issue or a car issue and that will tell us as well information about the confidence of the driver and the reliability of the car that is very important for the race so that's why we are printing and then we are start plotting information to see visually who is the best who is the worst in terms of a great for a driver error we are doing the same for the constructor this is about the reliability of a car basically means that a flat tire an engine issue a power unit a problem hydraulic issue with the pressure those type of things will fall in this in this category as soon as we have that we're going to start analyzing some interesting information as well um in this case we are analyzing the drivers in the home circuit how they perform there are few drivers that are really good on those specific home circuits either because they know it the circuit very well because they've been training since very young so that that is an interesting point as well and we are doing the same for the podium but also we are doing the same that for the constructor so some constructors like for example mercedes redbull and ferrari they are doing really well when they are in their home circuits either because they have the fans pushing giving that extra power to the engine or because they are motivating the driver to to get a good result now we have all the data that we need we are going to store it in in a file and that means that we are done with the analysis of the information now without seeing the mouse i'm going to try to share again the screen coming back to the presentation and now we jump into a pause i will give you five minutes to explore the labs that we just walk through lab number two three and four on your own post your questions on the slack channel and see you in 5 minutes [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] hmm let's see how all of you are doing let's launch a poll to check if you have understood lab number two three and four if you feel familiar with the content in those and now we are going to go to the next one that is the important one for machine learning is where we are going to go through the algorithms that we are going to use and it's a very interesting i will take it very easy very slow on on the next one because i want you to understand the machine learning models that we are using and that is very very very important so yes checking the the pool results and on your own time keep working on the different laps and we will jump into the lab number five in just a few seconds the pool is stable already next let's jump into the implement the machine learning models so let's jump into the browser once again and we are going to we are going to go to the beginners folder and then in this one we are going to work on zero for ml modeling let me close the ones i'm going not to use and okay but bear with me for a second because this one is the most important part for the workshop for you to understand we are going to cover six algorithms uh we explain all the content here so you can really you should go through every single word in this notebook because it's explaining what all the algorithms are doing what are they doing best and basically i just want to do an overview of them the first one the classic logistic regression using the logistic function no issues there and it's explaining what is good about this one but basically it's the simplicity it's a binary one so it's going to push the values into one and zero and the most important part is that it's very simple we are going to use the decision three and also the random forest that are related so let's go to the section where we compare them one to one so i want to explain a little bit the difference just in case you are completely new to machine learning so we are trying to understand the data and start learning from that data from the point of view of the computer the machine so we are going to analyze features that are those data points with different features of the information and a decision tree will be as based as okay i have these features um i need to take a decision i will go left or right that is exactly like if you go in in the nature and then you have a pathway and then suddenly you have like a fork that takes you left or right and based on the information you have you have to take that decision um imagine uh three features we are talking about the the the weather conditions could be green and it could be the wind spin the wind speed but also could be the the color of my t-shirt um so you're willing to take a deficient based on that information um and then go left or right and you learn from that error if if you go right and it wasn't the right one and then random forest is a little bit more specialized imagine that well the color maybe the color of the t-shirt doesn't really make sense if you want to go left or right it's not going to be that relevant so what random forest is good at is that it's going to peak randomly some features and run the decision tree on those so that is interesting because we are going to have a variety of features and we are going to spot which one are more relevant and which one are less relevant and then we are going to have the the division and the learning process from that left right diffusions and error measurement that is the the biggest difference between the the decision tree and the random forest now we have the support vector machine that is a little bit more complex in terms because there is a little bit more mathematical concepts behind but explain it very simple um if we are in one dimension we have just a line then there is a point that is going to be the the threshold that is going to allow me to decide left or right or in this case red or green so that is our feature that we want to predict and then there is a threshold imagine that instead of one line and we have to design to define a threshold in that specific point we have like two dimensions we have a x and a y and then we have i don't know h and we have the height so in that case instead of a point will be a line at that line we'll segregate the information into red or green and at some point we can have more variables more features and that will be when we need to work with planes a plane is just basically a like a piece of paper that is going to just once again segregate information in the two categories that we have right and green in this example um a vector is a nice mathematical way to represent planes and in multi-dimensions and this is where everything goes a little bit crazy but basically it's a way to modify the parameters of that plane that vector representing a plane to be able to classify the information that works really well with machines and is as you see in the definition is good for high dimensional spaces and so on because it's really fast in those cases then one of the algorithms that everybody is talking about in one way or another is the knife base basically we are going to use the gaussian one but basically it's working with probabilities it's just working with the histograms of the information and based on the probability of those cases to be true or false then it's going to calculate probability and that is going to give us a learning way a process to learn how to classify some data points and then probably the a simple one to explain is the key nearest neighbor in this case if we have that information x x1 and x2 in that plot we are going to have green and yellow categories category a and b on the example on the left and then we are going to add a new point that point we don't know what it is so we are going to try to understand uh what could be and the way to do it is to select what other points data points are closer so in this case it's closer to the green ones so it's deciding that it's a green one and then there is a point where if it's right in the middle and you are having a hard time what you do is you you define a hyper parameter that is just a parameter but we we say the name is k that is where the name come from and that k will be the closest number that we are going to understand to analyze and that is like a voting you are going to vote okay from the 10 that are closers where are you you are in the green box because you are in the yellow box and depending who wins then that means that that will be either a green or a yellow category for that new data point so those are the algorithms that we are going to use something as well interesting that you have to learn in one way or another is to analyze the data we have done a little bit of that in in previous labs but basically is to print the information i'm really a visual person so i really love this part when you are plotting the information you really see the trends you really see what is happening with the data so those are the type of uh plots that we usually use in machine learning so a scatter plot for different data points in in in a graph the histogram how frequent frequent different variables are and then there are others more complex like the heat map or the box plot that we are going to use in the future in this workshop and then another important concept for those of you that are new to machine learning is the splitting the data set so you have a big data set the bigger the better but you are going to use part of that information for the training part and then you are going to use another part for the testing for validate your learning so that is very important because what we are trying to to understand is like all the models are incorrect but someone's some of them are useful we are trying to to find the usefuls uh they will useful one so for that we are going to evaluate the model and what we are trying to do is uh and this is an important concept of machine learning is the under fitting and overfitting sorry victor to interrupt you uh there is a probably the microphone is growing on your uh shirt so actually giving noise okay thank you so much yeah let me know if uh continues doing that so i'll change the microphone uh maybe changing the the to the laptop one it would be like an echo but maybe it's better let me know how it goes so under fitting we are splitting the information in training and testing and we are trying to identify uh how good we are with that algorithm uh under fitting will be that we do very bad whatsoever so that is underfitting it means that you are doing pretty bad in all the the learning and that means that you have to improve that algorithm and with the parameters with how you learn and then you move to the other concept that is the overfitting overfitting will be that is so good so good but only for the training data so that first split that we did on the information we are really good at that when we are testing in a new set that we know the answers so we are using it to validate the other then we don't perform that well and that is equally bad because we are actually performing really well on a specific mock data but we cannot predict the other new data points that we are going to get in the future so it's a thin line that is where machine learning starts to become a little bit like an art where you have to make sure that you understand there are different tricks on how to avoid the overfitting but important concepts on that and you will find them when you are reading through all the documentation is the bias and the variance is these really important concepts that probably outside of the scope of the presentation but it's really important to understand the difference between the the bias and the variance of information and now finally thank you for your patience we arrived to the code so there is not really a lot new to mention in this but basically we are importing a little bit more libraries basically the ones that we are going to use to do the different algorithms so as you can see we have here a logistic regression random forest svc decision three key neighbors all of those are the algorithms that we are going to use and some helpers like the crossbar score that i will explain in few seconds now nothing new is what we have done in other labs we are reading information from the previous states the data that is filtered and ready to consume we are going to print it make sure that everything has been read and everything is the way we want we are going to make sure that the the site of the data is correct that it makes sense to what we expect and then we are going to start working on the different information that we are going to use one interesting feature that we want to analyze and we mentioned before is the confidence of the driver so that is based on the the times that the driver hasn't finished because an error so we are going to group the data by driver and then get the total amount of races that that driver has done and we are going to use that information with in combination with the times that the driver didn't finish and that will give us a dnf did not finish radio that is going to be exactly the confidence of the driver if i've been failing at many grand prix i will be not that confident during the race so this is the way we print that information and we can see that there are different values for example gender button has a 94 and you can have different values running from 80 something to 90 to one in some cases so that is the level of confidence that it's really important to understand and then we are going to do exactly the same with the constructor that is the reliability of the of the car basically doing the same we are going to get the data group it by constructor and then get the information of the total number of races and the the ones that had an issue due to a constructor error and in that case we are going to create the clip no finish ratio for the constructors and we are going to print it so everything makes sense and you can check like for example mercedes and red bull or 80 something and others like mclaren is 61 percent or even marutia back in few years back 16 so that information is really relevant we are going to aggregate that information into our data set and start working with the active constructors and the active drivers because we don't want to get the results of niki launa for example as driver that is not any more a driver but that is just a way to filter this information we are going to use a lambda function that is basically a anonymous function in python that is going to just decide if the information is an active driver on an active constructor or not then we print the data and that looks really good we have the volumes all the columns are good for us and then we are going to start working on the model so in this case we are going to make sure that we understand um the information and we start working on the preparation for the machine learning and if you see here we are going to get the information for active drivers we are going to create the standard scaler and label encoder what is that basically machine learning is going to work with values we don't need a driver called jenson button tell us nothing to be honest from the machine point of view so we are going to um transform all of that into numbers and those numbers we are going to normalize so normalize means that the value will be between 0 and 1 and that means that is easier because if not bigger numbers like imagine we are talking about the the number of phrases that a person if someone were there for many years is going to look better for the machine if we leave it like that so we normalize so everybody goes between zero and one so that is where we use the standard scalar for normalizing and the label encoder for converting labels like text like names of circuits names of drivers into values 0 1 0.7 whatever is the value so we are going to do that and then at some point we are ready to start working with the cross validation of different the machine learning models and that means that we are going to create the different models that we are going to use back to the six that we mentioned at the top logistic regression decision three random forest etc we are going to put the names as well because it's going to help us to print that information later and one interesting part that we are going to cover is this function so we go through every single model and create a cross validation that basically is telling this is the model this is the data tell me how good we are so the scoring will be the accuracy of that and we are going to print it in one way or another and we see that logistic regression deficiently has different levels of accuracy and that will tell us that if we plot that information it's going to tell us that some do better than others so for example in this case we have this one that is the random forest that is on top but we can analyze as well that the gaussian knife base is not doing that well so we can understand which one is the best for our case and we can move forward on that direction we are going to do the same with the constructors so in this case we are going to analyze the information from the constructor point of view doing exactly basically the same code and we change the data that we are plugging in and in this case we are once again analyzing the results and once again we are between the spc the support vector machine and the random forest so this is an exercise that probably we don't have time for you so what i'm going to do is to copy the results from the rest the rest folder but basically we are doing the same with both drivers and constructors and that is the interesting part that is where we want to use our machine learning models and learn which one performs better so i'll probably execute it or not depending uh on the time i have to check it but basically you have these empty cells that you can fulfill and do the cross validation and implementing the block spot so i'll try to do it and then we jump into the the model that we are going to use okay uh so let me i think i have time uh let's go really quick i'm going to get the address open the one that i want that is the address the solutions and you will see that here we are working with this as the same file just getting a little bit more information about the the the constructors and drivers so this is the code that we want the clean the cleaned part is there i believe yeah so that is the the way we clean the information and now we are going to jump into the the data set that we want so the x is for the information we want to train our data model so let's copy that if i were you i would try to do it on my own first and when i fail if you fail then i will go to this but just for saving a little bit of time so now the next one is to filter the data set so if we go to here we are going to get active drivers and active constructors you see it's very similar to what we did before in the constructor and the driver separately so it's just merging both things and paste that now we go to create the standard scaler we know about this now already so is to scale the numbers to normalize them and the label so we are going to do that in gp name constructors drivers etc let's put this here and then final step will be to prepare the x the feature data set and the y to predict the values so that implement x and y will be those two so basically are getting the information we want so we are moving from the position active driver active constructor into a position let's move that and i think that is all we have to do the cross validation we are going to copy this but we also can just go and check the results here and probably is what i'm going to do anyway plot will be just that and then i will just leave it running so you see how i run the different notebooks so in this case i'm going to plot the results and in this case what i'm going to do is to restart and execute the whole thing so i click there and then press start start partner you see now is busy is doing all the cells one by one and you can check it because it's moving across this you will see that it's doing the cross validation it's getting the results it's plotting the the values all of this is new information that we are getting random decision forest now 94 the rest of the algorithms that is how we execute all of this and that is all for this specific part this is the graphics that we are going to understand in terms of analyzing which one performs better so we see that the red one that is random forest is performing very well across all the information from only drivers only teams or constructors and the combined data so that means like we have make our minds looks like random forest is the one to do so in this case let me come back and share the presentation and we are going to work with a model serving so now that we have decided which one we are going to use we are going to use the parameters we are going to find all of that information and then serve it so it's ready for consumption from the ui point of view in that case let's come back to the browser and now we are going to jump into model serving model serving if we come back and close the ones we don't need we go to zero five ml modeling and we are going to understand first as usual we import libraries and then we get the information the only difference here is that we are going to use a library called pickle pickle is going to be able to serialize an object in python so that is the way we store a binary with the model the machine learning model in a file and that is the file that we are going to use later okay so let's get the data filtered and now we are going to print the information that is very standard by now we know that we are going to calculate the driver confidence we are going to do the same with this is the different components for the for the drivers we are going to do the same with the constructors that is just a replication of what we have done before and then we are going to analyze the team's uh probabilities after last race so this is analyzing the information for 2021 so we are going to go to a specific race and see and check and validate that everything works the way we want i don't want to spend a lot of time because this one is already something that we have covered it's just printing the information we get from the difference with the points that is something that you can take a look there is no new code there that you are not familiar by now the points by constructors and it's going to analyze all of that information and generate a qualifying data set for the predictor so basically we are trying to understand how good we are and finally we are going to generate the machine learning model random forest is the one that performs better so we are going to create a function to save the model that is going to use the pico library to work on on the to store the model in a file now we import the libraries nothing new here in the the cross validation score we are going to get the logistic regulation the random forest we are going to basically do the same but the most important part is that we are going to calculate the parameters for the random forest and that means some depending on on on the library but basically you are going to create some parameters to work with that and then we are going to read the information start working with the active constructors and active drivers something that we have done in the past but this notebook is is standalone in that terms we just get the data the general data and apply the random forest this is where we created a random forest object and we are going to start fitting that that is like the learning part this is where everything happens we pass the information past the the results the learning the supervised values and then we are going to get the best parameters that is from the random forest object and we are going to see that the number of estimators the number of samples split the leaf all of that is information that is where we perform better and now we are going to create that one store it get it ready and after we have done that the confusion matrix and all of that information feel free to con to take a look there in that code and we are going to save the model okay so we are going to save the model in that specific folder with that pickle extension of the file and that is a binary file that contains all the information to use on the website as soon as we have that we are going to analyze some information that is very interesting in that specific case about the features so there is a nice graphic here that they want to cover and this looks like after applying the random forest algorithm we see that the qualifying position so the the position you get of the grid on saturday sessions is very important if you are at the top it's probably because you are faster and that means you are going to end up at the top the day of the race that is no brainer but that is a validation that our machine learning makes sense that is something that you have to think as well when you are creating your machine learning models something that is as well slightly important although a little bit far away from the qualifying position is the driver confidence and the constructor reliability if the car is going to fail there is nothing to do even if you were in poor position at the top of the grid you still have problems on your way and the driver confidence that apparently is very important we were checking that in the exploratory lab because we understood that even something as simple as a racing in your home circuit is important so the confidence and the mindset of the driver is very important then there is really just an overview of the of how we even plot how everything was analyzed from the random forest algorithm so you can understand the colors are the features and you can understand how everything was executed and then at some point we can just analyze the estimators different estimators how that learn in different ways and finally we are going to see the results in the last part that is the the ui part so at this point in time let's come back to the presentation and i'll give you five minutes to explore these two labs they are the most important part so feel free to go through them understand them and implement the machine learning models and the model serving and post your questions on on the slack channel so the team can help you and see you in 5 minutes [Music] [Music] [Music] [Music] [Music] [Music] so let's see how all of you are doing let's launch a poll to check if you have understood lapse five and six i will use this pause just to remind you um we have compressed five hours of content in just two hours so i just walk through the content so you understand and you can start making your changes on the code and feel familiar even i invite you to test with the latest information from formula one uh check if you can change the application as well that we are going to just cover in a few seconds so the code is there you have created your cloud account that is where you can have some issues and benefit from a team helping you you also have your stack with all the environment ready to go all the information you know how to access you know how to read the code you know how to execute the code and you can move from different folders and understand what is there so that is very very important for me for you to do at this point in time i'm going to show you the ui part and basically let me just go to the browser again and in this case what we are going to do and i'm just setting the pool yeah all good yeah everybody is moving forward very good we are going to open a terminal here inside of the of the jupiter notebook so let me clean this a little bit if you don't have a launcher and this is the place where uh you landed at the beginning of the of the workshop uh you can always click here that will open the launcher new launcher and in this case you you have different options you can create a new notebook something that i really recommend you to do and start playing with the with the libraries but another important for this specific lab the ui part is to open a terminal and then if you list the files you have there you will see that you have two executables one is the launch app sh so let's execute that and this is all as well steps if you don't feel familiar with the linux part these all the steps are in the in the content of the of the workshop so i'm just going through the step and the first one is launch up sh start it's not going to return anything it's just executed if you get any error this is the point where you will get it but that means that the application is uh also launched inside of your virtual machine the one where jupyter notebook is is being executed so i can copy the public ip because it's going to be the same host but a different port instead of 8001 for the jupyter notebook i'm going to use a different port and that will be same ip semicolon 8080. 88 is the port where that application is running so here is the application formula one predictor and we can select different circuits so i'm going to select a monaco in wet because i think that is fun and i'm going to predict the results and it's going to tell me that much for verstappen with oracle technology and red bull racing honda is going to win that race in that conditions but you can play around and get different results so for example if you go to i think another interesting is a hungarian but silverstone silverstone in warm conditions and we are going to get that lemmings lewis hummingstone is going to be on top and then max what happened and bottas on second on third place so that is the way we consume the service and this is a way you can put your machine learning models into production what is very very interesting it's not that you can create a machine learning is that you can liberate that power in in a website so this is the the final step to consolidate all your work that you can do um you can always stop uh this one so for stopping i'm going to actually go through the history and instead of start go to stop and this is the way you stop the web from working yeah i think that is all from the demo point of view let me come back to the presentation this is as well a good part for you to catch up with all the content that we have exposed and then i'll give you another five minutes to explore lab 7 the ui access to catch up with everything else if you if you feel like the most important part for you was the the six algorithms and that part of the code that's fine if you get there and spend more time now and we can going to support you and i will check all the questions for the next few days so feel free to come back and let us know how you are doing if you are doing something amazing with this content i would like to know as well so see you in five minutes [Music] [Music] [Music] [Music] [Music] [Music] [Music] hmm hmm we are at the end of the event so let's check next steps and recap basically before torrei come back and wrap up the event i want to say thank you to all the panelists first of all but i love to learn and this has been a lot of fun i hope you have the same feeling that you have learned a lot about the basics of machine learning now you have all the code you have the environment up and running you can start spending time with this code and and get to know machine learning how to apply it how to put in production and so yeah i know you have hearing my voice a lot this morning so thank you so much for your patience and now it's time for you to enjoy you can join me here yeah of course and and and thank you victor so this is the end of uh the the race of today let's say the race was just for victor as i mentioned into the the chat or in private conversation with some of you this is not a race so we are here because you need to benefit of the expert that can support you and drive you so for this reason we created this slack channel where you can continue to uh to post and rise your question uh the inside the information we shared today are public so you can benefit and you can go ahead practicing the content uh there is a recap here of the again of the bit link but you will get also again if you check the remainder remains there are the same link there and we will send a thank you mail also with the link to the record of the today's session uh so here you have additional resource i put it into the chat the link to our feedback survey for us it's relevant that you share your comment your feedback we can just learn and improve okay what we can do better and then here you have a link to the next master class up that massacres this is part of an umbrella program that we are running for the developer community in europe middle east and africa so the next one will be the 11th of november on how to deploy office application and combine digital assistant on it so it's pretty nice and it will be two hours where you can you can learn something more or uh discover something different uh with with oracle then uh there will be another event that will come on red bull racing so my advice is to follow this developer.com red book page where we are going to post the new the new session will be uh at the end of november this one then again the lab guide and the slack channel no for you to be in touch with us uh last but not least you will get this bit link to participate to a quits that is based on the content that we share today so it's an easy way to get your digital page for this session uh and i hope that you will enjoy and you will share and create buds on on on the social media channel you own and last but not least i think it's very very important and interesting for all of you until the end of the calendar year so the 31st of december there is a bunch of training and session sorry training and certification that are free so our oracle university program is pretty good at the moment i advise you to get the page and understand because it's really about you in this case in my opinion so there is the possibility to you to be certified on oracle ci on different matter and of course you can uh grow know the the skills and you can grow your curricula and insert a new new possibility in your career okay so i advise really to take a look on this and are for free until the end of the year and we are really now at the end so uh thank you to all the panelists that are supporting you in the in the slack channel uh really good insight also from from that part but at the end thank you to you our guest to participate to today event i hope that you enjoyed the the content and uh speak you soon
Info
Channel: Oracle Developers
Views: 457
Rating: undefined out of 5
Keywords: oracle cloud, oracle developers, oracle database
Id: n4-Azr8xLQE
Channel Id: undefined
Length: 105min 21sec (6321 seconds)
Published: Tue Oct 26 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.