Time Series Anomaly Detection with ML.NET

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi everyone and welcome to another video today we're going to learn about time series anomaly detection which is basically the possibility of taking our data based on time and detects something like spikes or the beginning of a persistent deviation a trend this is great for something like sales the deck points in in the year where your cells got a spike or we're going down quickly in our range of time he's created for taking decisions and it's not only for sales i'm just saying one example but we can use it for checking the behavior of our users or or are visitors for checking their reviews a lot of stuff and detecting real changes that happened for x y factors and take action based on that it's great for alerts and many more and it's used for things like cyber security customer analysis business intelligence and many more things so let's see where we can get with it okay first we need to understand our data or what we can provide to this algorithm so here i have a data set of sales by date which is basically an asset called shampoo sales over a three year period it's from dollar market it's provided by the time series uh the library and it's created by rob hindman it's a sample data set and yes it has the month and how many sales this could be anything that could be anything it could be how many busy doors do you have in your web page in a week every week or if you get if you have a way to count how many cyber attacks you're getting you can get you can detect spikes on it by just having the how many cyber attacks you have every single day or every single week or every single month just you just need a quantity and a time and you should be able to get detect behavior changes or spikes in this case we'll be using sales so now that we understand our data and knowing we have up the month and the number of sales let's we have to create a class that represents that and i have here the month and the sales in in my class cells i can have now an array which will contain all these items i just make the conversion let's see where i make the conversion here i just take the csv and convert it i convert it to an enumerable to load it to my machine learning context as an innumerable but we could learn we could load anything we could load even the csv directly i just wanted to use onion a and enumerable to make it easy to understand if we want to like take it from a tv context of that net or others source i think it's more common in this way so yeah and we also need this a sales prediction which is which basically will be our output class i have our sales prediction class and it will give us a double array this double array contains only three items the alert which is which basically will tell us if yes or no one zero if a that our a point in our data has a spike or a weird change then we have the score and the p values which are basically a statistic parameters for for our use we will not take them into account for our example i think we can get what we need by just using the alert so okay now let's see how i make the predictions it's really simple it doesn't require a lot of code to attack spikes we need to use something called the tech iid spike it's uh it it provides an estimator and it requires basic stuff the output will be sales prediction as i mentioned before the prediction property from the sales prediction class and the input will be number of sales which comes from the sales class and you might think where is the date yeah we don't care we don't care about the date because what happens is that it will detect spikes based on the index and because we will have the index exactly the same we if if they said that in the third in this third item we have a spike we just go here and take the label so it's really easy it will set in which point in the in the data it has found the the spike cool now that's clear in our algorithm know which column needs to read together are the values we need to set some statistic parameters tell us how many confidence it needs to have to like detect if there's a spike or not like a how much confidence like uh to he needs to have to like make sure that it will provide a zero or a one a boolean detecting if it if we really have a spike and the data size for analysis we don't care a lot about this one i'm just dividing my my data set size by four and it should be enough to get an accurate representation so okay and i need first to train my model to train my model i just said okay spike estimator fit this data view this data view could be empty here and we can save already our model by just um doing us a context save model into a file so we could reuse it but in this case it's completely unnecessary it's it's really quick and based on our current data we don't need to train and so yeah now that we have already fitted our data or trained our model in theory we just transform our data into the output like what will happen is that it will convert a we will it will convert this to the to this output to uh an either an ew that will have a column called prediction and now we can call a create enumerable and send this and it will map automatically that okay in this class uh we have a prediction property and this has a new with a with a prediction column put that in there and that's it a reuse row object this is b it's just to uh specify if we want to create a new object for every single item uh let's put it as false no not a problem so now what we have here after we run this as that we detected the we detected all the spikes and everything we will get an array of sales prediction which which will have exactly the same quantity of items that our cells has and in the predictions we will have an array of three items the alert which is a zero no if that point at an specific point has a spike we will check that out later but let's see how the other one works it's exactly the same thing but the method changes it's the check iid change point and this is the method that will lead us to check the beginning of a trend the beginning of a persistent change in time so we provide the same parameters uh what will be the output column what will be the input column the confidence that we need and how much items from the asset will be taken into account for every single set detection we'll leave it as our asset divided by four train our model transform our data and we get the same predictions cool let's check uh everything so here what i'm doing is that i just take my sales i create an instance of the email context which is something that i'm sending here as you can see the everything here comes from the ml context this is something that we have already talked in some of our videos and we have the data view which is basically our data converted to a form to the class that it's used for ml.net we also need to provide the data set size because the iw is doesn't provide at least for this data type doesn't provide the count of elements so we have to provide it here so take the sales create a machine learning context ml context uh load our data so we have an idata view that a format that ml context can understand and we get our predictions it runs the process it trains its model and transform the data and we get our sales prediction back now i do a little display which is basically uh i just create a little plot and i display spikes on everything but let me show you how i detect if there's a spike i'm not only making it into a little graphic but i'm also showing it in in the in the console so let's let's go quickly i just take every single sale that i have i just loop through it and i get okay first i have the the the sale item which we basically at this at the index zero should be this one now i get the spike that it's in the spike prediction that it's for that index and it will have uh on the first value of the prediction because of course this gif has the property prediction which is a double array the one that we have here on our on our model but the first value of the array is if it has a spike or it doesn't has a spike and this one is the same thing we have a three value array and the in the first value will show you if we have a change or we don't have a change these are the predictions that we already got from the methods before so as i mentioned before yeah i take the predictions i go to a projection property take the first value and if it's one if it's one then it means that at the sale set the sale month there is a spike and i just add a point to my graphic and that will be shown for displaying but basically that's it we just take the predictions and we by just checking what value it's on the first on the first item from the array we can have exactly where the spike was or where the beginning of a trend started let's run the application and see what we can get and see if the data makes sense so we already have some data here we we know that at 9 of january we have a change which is this one and we know that at 11 of january 10 10 february 7th of march 9 of march we have some spikes this is this is great this is already an alert this is something that we that we could that we could use but let's see it in a graphic to see if it makes sense because we as humans we can detect those spikes with common sense but let's see if the the machine did it accurately so let me just open this and here let me just i think i didn't explain you how the labels work but the the spike points will be shown in red with a triangle the change points will be shown in blue as a square and the normal every single item will be shown as a circle with a with a line in green that's the fault of accept but as you can see here i can detect that here i have a change a change in the behavior this is something that never happened before in this way it's it was a persistent a persistent downwards in in this time range and i also have these spikes here that as we can see make a lot of sense if you look at the chart this was really a spike at this point in time as humans by just looking at this without the let me just remove for for for the sake of examples let me just remove that and just see the graphic without the the indicators cool now i have our graphics with and without the indicators so cool if we check this of course it makes sense and yeah like without the indicators we can like detect stuff but the thing is that we did this with code our our machine or our programs can now check this kind of things and make alerts so we don't have to check graphics every time and detect things and make sure that we don't miss anything it's the the algorithms which are detecting this and this could work for a lot of stuff maybe you publish something in a specific month that got a spike and now you can repeat that kind of stuff to get into that trend again or maybe uh because of changing quality or stuff that maybe different factors that affected your sales your reviews uh the uses the usage of your products all that kind of stuff can be measured so you can detect spikes or changes in the behavior changes in trends like this one i hope i can see how much ideas you can get but because there is a lot of stuff that can be done by just checking an anomaly in a time series you see to get the predictions only we took less than 11 lines and that could give you notifications when you're making the right decisions for your business or your youtube channel or your computer if you have a memory leak any data can work to make predictions of spikes or a changing on a trend so why don't use it i know you can get great ideas and solutions with this check the code and check the documentation for more information this is ml.net if you like this video don't forget to press the like button it's right there it's easy to click so i don't i think it's a waste if you don't use it you should use it that's why these people on youtube developed it and if you enjoy this content don't forget to subscribe we always upload things and check our other videos are great and some of them include me so yeah i know you will like them bye bye and happy coding you
Info
Channel: Hahn Software
Views: 1,295
Rating: undefined out of 5
Keywords: #C, TypeScript, Angular, .NET, dot net, JavaScript, development, webdevelopment, time series anomaly detection, ML.NET
Id: kmPEzLAY894
Channel Id: undefined
Length: 15min 20sec (920 seconds)
Published: Fri Aug 26 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.