Predictive Maintenance & Monitoring using Machine Learning: Demo & Case study (Cloud Next '18)

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

MANJU DEVADAS: Welcome, everybody. Hope you had a good morning since morning, and I hope many of you attended the keynote session. Let me introduce myself, and I also have my co-speaker, and we'll quickly give his introduction as well. My name is Manju Devadas. I'm the founder, CEO of Pluto7. We are one of the key partners for machine learning and AI implementation for customers on GCP. And it's my honor and pleasure to be co-speaking with Purushan Dhingra, so he'll be joining onstage very shortly. And a few things I want to mention about him is, he is one of the person at Google I've actively followed for a while. And many of you may have seen him on YouTube, at various sessions. Very insightful. Buckle up your seats if there are seat belts. You're going to see some pretty incredible demos and walkthroughs of real world problems that are being solved on Google Cloud with machine learning and AI. So before I pass it onto him, I would like to mention a few things and also walk you through one of the case studies that we solved. What we essentially do is going to customer sites and look at, from an innovation angle, what can Google machine learning and AI technologies do for solving their real world problems. I'll go into one case study. There's also a deeper session later in the day on this case study. So while they're bringing up the right slide, let me give you some introduction of what I'm going to be talking. Improving taste of beer. What does machine learning and AI got to do with improving taste of beer? That's the question that was posed to us less than six months back. And what I'm going to walk you through is working with one of the world's largest brewery. I'm pretty sure eight out of 10 beer drinkers in this room drink that beer. And I can't tell you the name. You'll figure it out eventually in the later session today. Essentially, they said, OK, we think machine learning and AI can do something. But we don't know exactly what. You're the supply chain manufacturing domain expert. Come and show us what it can do. In short, when we solved it, it was an eye-opener for them. Not only selling millions of dollars, improving the taste of beer, which is a good thing. Having said that, I'll walk you through that a little bit more. If you saw the keynote from [INAUDIBLE] this morning and the demos, those were some of the very initial examples. There are tons and tons examples of use cases being solved with prebuilt machine learning models-- auto ML, custom ML models and so on. So I'll walk you through one example. And then, of course, during Q&A, we'll have a lot more to speak about and answer some of your questions. When it comes to predictor maintenance and preventative monitoring, Prashant is going to walk you through some of the deeper demos. But let me set the context and a little bit of thinking. Machine learning and AI, why is it a big deal? If I told in '94, '93 in an enterprise setting, internet is going to change your world, it's like, yeah connecting to computers would be useful, but not sure how it's going to change my business. Today, nobody needs to explain how internet change your business. If you take machine learning and AI, in my simple explanation, we rely on computer to do a lot of different things-- computations, storage, organizing, searching, finding, but ultimately, the decision-making we humans want to have and control. Just like driving the car, it's very hard for us to give complete control to the driverless car. It's the same thing in enterprise decision-making, whether it's invoice processing, or replacing a part, or planning a shipment and so on. We want to be in control. When you break down these problems and allow a machine to do it, in some cases, it might do a better job and help us. So it is this decision-making that we are talking about when we say let the machine crunch through the numbers, look for the patterns, and make better decisions than humans. Now, I just spoke to you about the beer, one of the largest consumed beverage apart from water. When we say we'll improved the taste of beer with machine learning and AI, it was something that everybody wondered. What would that be? What has that got to do with taste? Now let me go into a little bit into the detail. So as most of you may know, creating a beer involves fermentation in a kettle and then filtering the beer to get it into your bottle or can. And for that, when is the beer ready to transport it into a bottle? It is mostly human judgment in most breweries. It is looking at the color of the beer or the particles and what's called turbidity. There are few things that humans are involved. In this particular brewery, there was a 30-year experienced brewmaster who made the judgment. We'll talk about his accuracy level in a minute. But essentially, the problem what we are solving here is the beer flows through the kettle, passes through the filter, and then, the clear beer comes out of the other side of the filter. And then, it gets bottled or stored in a can. Now when you replace this filter, here is the key problem. The problem costs millions of dollars when it's done wrong. So as you're processing the beer, as you are bottling the beer, the color of the beer-- there's good beer and bad beer-- turns from good, bad, to worse. You want to catch it right before it gets bad. In this, ideally, you want to catch it at the very right time. And now this is a human decision-making looking at the taste and color and so on. And then you say, oh, the filter has gone bad. Now, let's replace it. You replace it too early, it's not good, or too late, it's not good. Essentially, if you replace it too early, you're replacing a filter that costs-- it's not just the filter cost. You bring the production down. You're doing it at Monday at 11:00 AM, which was not expected. Now your labor, and then your transportation, your trucks, and so on. This one brewery makes 100 kegs of beer every month. So think about the magnitude here. So in short, when a brewmaster made the decision on when to replace the filter, he was 60% right, which means 40% of the time even whether the filter was wrong, or he made a bad decision. Essentially, he got it 40% wrong. Now how did he make this decision? He made his decision through his 30 years experience. He made his decisions through, OK, I look at temperature, pressure, turbidity. There are four or five different key data points, which he believed. I call it human bias. And for the most part, they are right. But sometimes they're wrong. And that's kind of what we want machine learning and AI to do something for us. And there is no magic here. It is more number and data crunching. And when we say no more number and data crunching, you look for data patterns with your neglected data. When I say neglected data, data that you neglected, because it's too hard for you to look across all the columns, across all the rows and identify the patterns. Or it's humanly not possible for you to identify the perfect scenario where the data patterns occur, such that it's telling you the filter is wrong. In the ERP, they had 200 columns of data. And they only relied on three columns of data, but with tons of useful information, which they couldn't. Now what's the difference with what we did? Again, like I said, there is no magic. We just looked through all the available reasonably meaningful columns. What is more commonly termed as feature engineering. We identified the columns which are more relevant. And we build machine learning model on GCML. Now OK, machine learning model, in case anybody is new, in my simple terms, it's just mimicking the simple decision-making process of beer filtration replacement. Take the decision, mimic that into a model, and deploy it on GCML. It's really as simple as that. But again, there is a lot of complexity you'll appreciate when you go into solving more and more business problems. So now all of these things when it is done, these are not done over a three months, six months, or a year-long project. There are experimentations done in weeks. And you have to show the results, results that they can believe in. And the best way is when they ask me to fly back and present about the results, it was not me presenting. It was the brewmaster presenting that I can't beat this model. It's too good. Now we need to take it to production. So essentially, it's not just building a machine learning model, but it's also making sure that your stakeholders [? believe. ?] Many of them are new to machine learning and AI. They need time to comprehend. At the end of the day, you're tying your machine learning models, your processors, and the information that you find together into a machine learning model that you deploy. These are pretty much the high level steps of machine learning model deployment. When you look at this on the left side, it's really there are two main components on deploying the model. First, you do it locally training and applying the model. So I won't go into the details because of these models, training and deploying, because there's another use case that Prashant is going to share, where you're going to get a flavor for what it looks like when a model is running. So in other words, preventive maintenance is one of the key topics that many companies around the world are watching very closely. Because there are numerous decisions that gets made in the manufacturing and supply chain world. And with that, what you are really talking about as direct ROI impact in the form of saving money, increasing productivity. So with that, let me pass it on to Prashant who has some very interesting demos to show you. Prashant. PRASHANT DHINGRA: Thanks a lot, Manju. [APPLAUSE] Thanks a lot, Manju, for the great case study. My name is Prashant Dhingra. I'll walk you through two cases. We'll showcase a case study, like how you can deploy predictive maintenance model on a [INAUDIBLE] data set. And we can also showcase, like yesterday, where you will use a river data. And we will see a scenario how you can predict the water flow in a river. So the common use cases for predictive maintenance are companies wants to predict which machines or which device is going to fail. So in machine learning term, we call it as a classification problem that you have a set of sensor data. As Manju showed you, it was a classification problem, like whether this filter is spoiled or not. So which machine will fail? Which device will fail? Or which car will fail? These are the one kind of scenario. The other kind of scenarios in predictive maintenance are what is the remaining life of a machine? So if you have a oil rig, or if you have an engine in an aircraft, what is the remaining life of it? If you have a battery, what is the remaining life of a battery? These are the second type of machine learning scenario. We call it as a regression scenarios. The other more advanced scenarios we use in machine learning is called optimization scenario. Generally, a human looks into the machine learning output and makes a decision. When you are mature company-- like within Google, we looked into our data center. And once our machine learning model got mature, instead of human making a decision, we let the machine learning make the decision itself. For example, in Google data center, we saved 40% of energy by using reinforcement learning, where the machine learning model makes a decision, like how much of cooling power to use. So first step is you build a classification or regression problem for determining what is the remaining life of a machine or whether this machine will fail. Once you achieve maturity, then you start building optimization scenario. Then there are four types of scenario. Many companies doesn't have a label data. So if you have a label data, like historical data, and when machine failed as a [INAUDIBLE],, you can build classification or regulation model. Many times, company simply wants to identify patterns, like where the anomalies are. Sometimes company doesn't want to find anomalies, but they want to create a benchmark, like if millions of vehicles are used, whether these vehicles are used in a right way or not way. If there are a lot of aircrafts, which aircraft landing is the right landing, and which aircraft landing is an anomaly? So you can create benchmark, and you can also create anomalies without using the label data. That's a fourth type of scenario. There's a fifth type of scenario also. I recognized this scenario when I was working with a customer. Sometimes it is not possible to identify whether the device will fail or not. But many times, you are interested in the outcome. For example, if you are measuring the amount of water flow in the river and if the device fail, you can make a machine learning model to predict when the device will fail. But when the device fail, the water is still flowing in the river. That same thing also happened in industry. Like sometimes, the sensors fail, but the data is still getting generated. So can you predict the water flow when the measuring device of water flow fails? So there's another type of scenario. We are calling it a predictive monitoring of a virtual sensor. You might be using a different terminology. So this is the fifth type of scenario. So depending upon your need, your customer need, brainstorm what kind of a scenario is the right scenario, which scenario will give you a right value proposition. Accordingly, make a decision what kind of a machine learning model you want to build. When you have decided which you use cases you want to go after, the common problem in machine learning is how you will collect sensor data. So here, we are talking about the machine learning for IoT scenario. Three key challenges are collect data from your sensors or from equipment, create features. You create features so that you bring data into a shape where the algorithm can recognize it and algorithm can work on it. Deep learning, generally, is very good in working with a data set, where even if you do not have the right number of features, it can identify features itself. But generally, you want to create features, so that you bring the data in a good shape. Once you bring the data in a good shape, then you can select an algorithm and build a model. So going back again, determine the right scenario for your customers. Once you determine the right scenario, ensure you can collect the data. Generally, we talk about defining a scenario. When you work on the machine learning model, you should try to convert a business use case into a machine learning use case. What do I mean by that is generally, we will say that here is a use case. We want to predict whether there would be a battery failure, whether there will be a car failure or not. But define in the use case how you will use the output of your machine learning model. What is the definition of breakdown? Because breakdown sometimes means device fails. Breakdown sometimes means the device is generating more heat. Breakdown sometimes means it is working at 70% of efficiently, and it is not fully operational. Sometimes breakdown means it is producing more vibration or sound. So define what is your definition of breakdown, which you want to avoid. Then define what kind of signals or patterns you have that shows that degrade. And determine how often you have been collecting signals and then how much of normal and failure data you have. Once you have all these data, put that details into a use case. Because at that point in time, your data scientist should be able to make the right decision, like what is the definition of breakdown, what data set you have, what you are trying to predict, and how much of normal and failure data you have. Then take it further. Convert a use case into a hypothesis. For example, many times you want to predict whether this device will fail. Sometimes you want to predict whether this device will fail in three weeks, one week, one month. So you want to do it in multiple periods. So define that period. Sometimes you want to predict whether this car will fail due to a battery problem or a starter problem. Same thing-- the machine will fail because of part x or part y. So convert you use case in the form of hypothesis. And once you have converted into form of the hypothesis, then do the data exploration exercise to determine whether your data set and the use case are right for this use case or not. And I will show you example, where we do two demos. And we will do the data exploration for both. So these are the general steps we go through when we build a model. We define a use case. We convert that use case into hypothesis and [INAUDIBLE] use case. Then we do the data exploration. In data exploration, many times, you make a decision that, yes, use case and data set match, and we can proceed and build a model. Then you select an algorithm, you build a pipeline, and after you apply the algorithm, you will have a model. And then you iterate on improving the model performance. Then you present the result of business and make a decision whether you want to take it to production or not. And if business is happy, you put the model in production and then start monitoring it. And this cycle continue, because your data pattern will change over a period of time. So you continue to monitor it. Many times when you do a data exploration, you realize that you do not have the right data or this use case is not a right use case. That time, you go back and make a decision whether you need a different data, whether you need more data, different data. And then you collect that data. Sometime you decide to change a use case. Then you work with your business to define a different use case all together that can be built on your data. For example, I gave you a river example. If there is a gauge in the river, and the gauge on the river breaks because a tree is floating in the river, and if somebody takes a historical data, like how often the gauges break, a river is floating in the river randomly. So you can't build up a predictive maintenance model. So you go back to your business and define what kind of other use case you can build. So then you can think through another use case, which we'll show you a second demo that you can define a new use case that if a gauge fail, how can I still predict the water flow? And then you have 12 months or six months to replace a gauge. So let's go into two demos. We'll take two example. We'll take a predictive maintenance example. And another one is a predictive monitoring, which is a modified version of predicament maintenance. When you build a predictive maintenance example, let's say you have defined a use case that you have oil rigs or you have aircraft. And you have a sensor data are coming from those aircraft. You want to determine when this aircraft or when this oil rig will fail. So you want to look for such pattern in the data exploration. For example, speed, efficiency, pressure reduces when a machine get older or an engine get older. So for example, in the day one, the speed will be good. When the machine is getting older, the speed will be lower. Same thing-- the heat, noise, and vibration generally are lower in a new machine. And as the machine get older, it started increasing. So look for such pattern. And when you look for such patter and you see the real evidence, then you can make a decision. And you can be more comfortable that you can build a machine learning model. So example here is we used a NASA data set. This NASA data set is about a turbine engine. So it's showing you Ford machine in four different color-- yellow, green, blue, and red. They are failing at different point in time. You will see sensor one value doesn't change on day one. And the [? day one ?] machine fails. So it doesn't have any pattern. But sensor two, three, and four, as the engine is getting older, their values are rising steadily. And actually, after half of the life of the engine, the values are rising more rapidly. So this gives us a confidence that if we use this data, we can predict the failure of an engine. So if you see such kind of a pattern into your machine or into your engine, it gives you a confidence that data exploration phase is good. And now, you can go ahead and proceed building the model. So whether you take an example of an oil rig, or whether you take an example of an aircraft, you can build similar model. So for example, if there are number of aircrafts, and if we click on an aircraft, we see what engines are installed. And then you can see the data coming from that engine. For example, there's a turbine. There's the nozzle pressure, temperature, and fan speed. This is the real-time data. You will see that every second, it is getting refreshed. Here, there is a historical data. And here, you can see whether there is any anomalies or not. This is a traditional big data in an Audi solution. On top of that, there's a machine learning model that predict what is the remaining life of the engine. Generally, the life of the engine is measured in terms of cycles. So it shows that there are 46 cycles left for this engine. So what is happening behind the scene is we're getting the data from engine. And as we get the data, it's shown on a dashboard, like what is the health of the current data, which is good. And using the data set, the machine learning model makes a prediction. What is the remaining life of an engine? So as I mentioned to you in an earlier example, first define a use case. This case is about predicting the remaining life of an engine. It's a regression problem. Then we looked into the engine data set. And when we looked into the engine data set, we saw there were three or four sensors where the values were normal. And as the engine got older, the values were rising steadily. So because there is a degrade pattern in those sensors' data, using those sensors' data, we were able to build a machine learning model that can very accurately predict the remaining life of an engine. And you can use the same concept in other domains also. You can use it in an oil rig. You can to use it in a machinery. So if you have a data set coming from your machinery or engine which shows a degrade pattern, you can build a model easily. This is the real data set, where you can see the data shows a pattern that the values are rising for these four sensors. Sensor two, three, four, and seven, for three sensors, it is continuously rising. And four sensors, it is coming down as the engine is getting older. So we talked about a use case. We talked about a data exploration exercise. So what are the best practices for data collection when you are building a predictive maintenance model? These are generally comprehensive data set or data attributes you can collect. So if you have an IoT data, knowing about the IoT data, like whichever type of device you have, it might be giving you temperature or heat, noise, vibration, voltage. Or it might be sending you images. Generally, the IoT data is very powerful in making a prediction whether that device will fail or not. So time series data is the most powerful and more useful. When you combine the data with the static data, like what is the make and model of an engine, what is the configuration and build or a software, what is running on that engine, combining that static data with a time series data gives you a very powerful overview. For example, in this case also, when we built a model on the NASA data set, we had an error rate of 45 RMSE root mean square error. But when we combined that with the static data, which are the operational characteristics of that engine, we were able to reduce 45 to five. So the error rate reduced drastically. Having [? Audi ?] data is great, because it's very powerful. Combining that with the static data makes it more powerful. Many times, depending upon your domain knowledge, you also want to put usage history data. For example, if there are two buses-- one bus is used to take 20 people to office every day. Another bus is used in a crowded place, where it takes 100 people to office every day. Second bus is likely to fail more often. So knowing how many miles a car has been used, how many hours it has been used, or how much was the load on that car or machine every time, generally helps you in making a prediction. If you take another example, if you are trying to predict battery failure, if you have a car at home which you start four or five times in a day versus if you are a contractor, and if you start your car 50 times in a day when you go from different homes and deliver things or fulfill some service and move to another home, second car battery is going to fail more often. So knowing the usage history data also makes your model very powerful. And if you do the maintenance on your parts, knowing about when was the last maintenance done, when was the last service done, adding that data set also into this data set also makes your model more powerful. So companies will not have all these four data set. But if you're planning for building predictive maintenance, plan for putting together such data set, like time series data is most powerful. Combining this with your recruitment detail data or operational characteristics data generally makes very good models. Depending upon your domain, if the load makes a big difference, have a data set about the load. And having a maintenance data also helps. So with this, you can build powerful machine learning model. So once you have defined a use case, you have done the data collection. What are the next steps for building a machine learning model? So in predictive maintenance, one of the common problems is you needs to have a label. And many times, you needs to create labels. So example here is, let's say you get signal data on different days. Or you might be getting different hours every week. So when a machine actually fail, that is the final label. But if your goal is to predict failure before one week, so you needs to tag one week of data before failure as label data. That is your failure data. So that is your positive label. And remaining data becomes your negative label. So that is additional work. If you do this exercise, then your model will become more powerful. So depending upon your use case, how much far before you want to predict the failure, tag that much period before failure as your label. Similarly, if you are trying to predict what is the remaining life of a machine, you can easily build a deep learning model where you have a final failure and you have the sensor data. But if you can tag the data at various places, like for example, how the signal was looking at 20% life, at 40% life, at 60% life, your model become much more richer. So if you're not getting a good result in iteration one, can try to create such label. And then you will be able to see good result. Once you have a data set, and then you have created the right labels, then next step is to create features. Depending upon your domain, you needs to select what kind of features you needs to create. But these are standard features, like you sometimes create minimum, or maximum, or count, or some of various attributes to determine whether those features has a pattern. Sometimes you use tumbling average or moving averages because they are generally good in showing short-term or medium-term pattern of failures. So you can create tumbling averages and rolling averages. Depending upon your domain, sometimes you have to create different types of features. So once you have a use case, you have a data set, now, you have created labels. And after that, you are ready to select an algorithm. So here, you will see that depending upon your use case type-- whether it is a classification use case, multiclass classification use case, regression use case, or anomaly detection kind of a use case, you can select various of different algorithm. And you can also select whether you want to use traditional machine learning, whether you want to use deep neural network, or whether you want to use more powerful technique that also use memories in the deep neural network, like recurrent neural network. So generally, for classification problem, you will use traditional ML, like random forest or decision trees. If your absolute values have pattern, then you will use deep neural network. And sometimes if the spikes are showing a pattern, then it's better to use recurrent neural network. I will show you an example when we reach to one of the graph. If you see that absolute values of your data is showing you a pattern, then you can use traditional ML, or you can use deep neural network. That will give you a good result. But if absolute values are not showing you result, and if you see that your failure is indicated by how steep the spike was, then you should be considering recurrent neural network. Similarly, for multiclass classification, same thing. You can consider RNN, DNN, or standard random forest kind of algorithm. You can also use RNN or LSTM for regression problem. Or you can use random forest or a hidden Markov chain for regression problem. If you are using an anomaly detection, there are multiple techniques for anomaly detection you can use. If you want to use deep neural networks for anomaly detection, you can also consider using autoencoder, which are very powerful in identifying anomalies in a vector. For example, if you have a complex data, and you do not know which of the vectors represented by this complex is anomaly, you can use autoencoder. And then there is a variation of it. It is called conditional autoencoder, which is very powerful in finding anomaly detection in a data set which has a complex vectors. So depending upon your use case, try to use simpler algorithm first-- traditional ML or deep neural network. If it gives you a good result, you need not have to try anything else. If it doesn't give you a good result, then you should try recurrent neural network. Regarding anomaly detection, you should also make a decision, like what kind of anomalies you are trying to identify. Are you trying to identify point anomalies, like if the individual data points are out of range, or contextual anomalies are like in a certain context, when some device was running in a certain context, it was generating more heat. Whether that kind of anomaly you're trying to identify, sometime you want to see whether a sequence of things were anomaly. So try to identify what kind of an anomalies you want to identify. Sometimes scenario can become more complex. For example, if aircraft fly in different weather conditions from different airports, and there are different types of aircrafts, and the runway size is different, and they use a different pattern when they are taking off. And you want to identify which one was the right takeoff, which one was not a right takeoff. Standard anomaly detection may or may not work. That time, you want to use autoencoder and condition autoencoder. So you selected a use case. After that, you selected a data set. And you have the right schema. Then you created the labels. You created the features. You selected the algorithm. And then, you go ahead and training model. So the traditional mechanism in training model is you have a denormalized data set. You give it to the neural network. The UL model get trained and start making a prediction. And you see the cost, like how much is the actual versus predicted. And that's what you get the confidence whether your model is good or not. And then you can take it to production. So generally, when you define a use case at that point in time, you also define what kind of metrics you want to optimize depending upon your case. So I will not go into detail about precision and recall. These are standard metrics for classification problem. But you may want to work with your business to ensure what is more important. Is reducing false positives more important, or reducing false negatives more important? For example, if you are doing predictive maintenance for aircraft engine, you want to predict failure in advance. Reducing false negatives is generally more important. But if you are trying to do predictive maintenance for a car battery, you do not want to replace battery unnecessarily also. So reducing false positives is also a good goal. So you needs to work with your business in advance to determine how do you want a trade off between false positive and false negatives, so that you can make a decision when to make model to production. And generally, predictive maintenance data set are not balanced data set. So you do not use standard accuracy metrics. So other metrics to consider is, like, how much attention you want to pay attention to TP rate or to negative rate and false positive rate or false negative rate. So work with your business to understand what metrics are right for it. Generally, data scientists are good in tuning the metrics. So even if you agree to pay attention to multiple metrics, agree on one metric, which you will optimize and remaining metrics you may want to have a satisfactory criteria. So that way your data scientists team know what is the minimum bar for the remaining metrics. And then, they can continue to optimize your optimization metrics. One other small thing-- if you are building a multiclass classification model, then you needs to compute that metric for the multiclass classification. So there are two methods used-- micro and macro. Micro generally give attention to each instance, and you sum up [? three ?] positive rate for each instance. In a micro method, you give importance to the whole class. And you can determine which class you want to pay attention to. And then you compute the metrics for one class. And then you average it out for the overall classes. So here is an example of a regression problem. I'll show you an aircraft engine data set. When we use the deep learning network using the IoT data, we were getting a metric for 45 RMSE. But even if 45 number looks very big, it was OK. Because the number was actually high. So when you will see the result 2017, this is the result we were getting last year. In the beginning, when the engine was new, even if it was predicting error rate with a high error rate, or let's say that in July 1, 120 week, and it might be predicting 90 weeks. So even if the error rate was 30 week, we still knew that there is a lot of life left. But as the engine became older, it started predicting right value. So it was still acceptable to business in this case that, OK, RMSE of 45. When we started adding other data attributes, we started getting RMSE of 2 to 5. Then overall, the results were almost correct. So the results were very good. So you should try to add more data element, like what is the operational characteristic when an engine is running. That will generally give you a better result. So we talked about the common use cases. And we take an example of a engine scenario, where we use a NASA public data set. And we predicted what is the remaining life of an engine. Let's take another example. We'll use a USGS data set. This is a United States Geological Survey. They collect data about how much is the water flow happening in the river. In the use case, they wanted to predict water flow in the river. And the reason was when a flight comes, the gauges generally break. And this is the time the government needs the data. Because emergency response team needs to know where to locate people. And this data is also used on a daily basis, like how much of water should be given to farms or agriculture, and how much should be carried forward in the river. Whether dam should be storing the water, or dam should be releasing the water. So USGS survey collect this data. And this used for flood risk, water distribution, reservoir, dam management, and it is used for agriculture. Now when these gauges break, we looked into the data. And the gauge's breakage was very random. For example, in California, gauges doesn't break very often. But in Alaska, gauges break very often because the water spikes are very big there. And generally, these gauges break either due to strong time or randomly, there is a tree or debris coming in the river, and that breaks the gauge. So here, you will see an example of a guess that the gauge has a part in the river. And there is some part above it. And if something is floating in the river, it can break the gauge. And you will see there are 8,200 gauges built across the nation. And these gauges has a huge cost also. They're important for life. But they also require $184 million to maintain all these 16,300 gauges. And when a gauge breaks during the storm time or a flood time, that is a time when government need this data most. And sending somebody to replace the gauge is very dangerous. Even if you predict in advance that this gauge is going to break and you replace first, the new gauge will break. So here, we discussed with business that predictive maintenance is not a right use case. So something else should be done. So we came out with a new scenario here that instead of predicting when the gauge will break, we should start predicting the water flow in a river. Predictive water flow in a river is, again, not a straightforward method. Because it's not like a sales forecasting, which has a seasonal attributes. Water flow in a river depends upon the weather, rain. It also depends upon how much snow is melting. And how much snow is melting depends upon how much snow was accumulated last year or the last few years. And that is a variable, and one can't build a model very easily. There's a lot of research happening for last many years. And that problem is still not solved. So what we did in this case is we looked into the data that we know that we can't predict the water flow in a river. But we looked across the watershed. Here, if you look into this example, we did a data exploration exercise. This exercise done in Google Data Studio. We looked into the USGS data. There are gauges in a watershed. You will see that there is at least a pattern when the water level is increasing, it is increasing in most of the gauges. And when it is decreasing, it is decreasing in most of the gauges. There's no straightforward correlation. But there is some pattern. You will also see that there is a negative pattern. Like if a dam is storing a water, or a reservoir is storing a water, during that time, the river connected below it, their water flow reduces. When a dam or a reservoir release the water, suddenly, the dam water flow reduces, or the reservoir water flow reduces. But the water flow downstream increases. So there is either a big positive correlation or a big negative correlation. So first exercise we did was using machine learning model, we tried to identify what set of gauges has a correlation-- positive or negative. And then we eliminated gauges that doesn't have a correlation. We also did some more narrow correlation excise. Here, you will see that these gauges, either the data or the water flow increases suddenly in all of them or reduces suddenly in all of them. Then we looked into a sheet from the big quarry, where we build this very simple correlation metric in big quarry, like which gauges act together and which gauges doesn't act together. Once we have this data set, then we had high confidence that we can go ahead and build a model. At that time, we built a model. And then we were able to make a prediction with a very high accuracy that the red line shows here is the predicted value. And the blue line shows here is the actual value. So this is on a training data set. This is on a validation data set. It's [? close. ?] And I'll show you a demo. This is on a real data set for the month of July, which we have never seen before. It's predicting quite accurately. So here is a simple application. This area is about Montana river. So it has a number of gauges. If I click on a gauge, I see the water flow, like how much is the water flow. And then I can go here and see what is the water flow. And assuming this gauge was missing since May, these are the actual values from May. Assuming this gauge was missing, the purple value was the predicted value. So it would have predicted very accurately when this gauge was broken. So in this case, we looked into the use case. The use case was, can we predict the failure in the water gauge? Then we determined that this use case is not a right use case for this problem. Then we shifted the problem. Can we predict the water flow? Predicting the water flow, again, is a very complex problem. There is a research happening since last 20 years. Because it depends upon the snow melting, nobody is able to predict it. So we used a workaround technique here, where we took a set of gauges together. We identified that these gauges act together. There may not be a very strong correlation, but they act together. And then, we will build two machine learning model. One does the clustering of the gauges. Second that does the prediction. So if during a storm, let's say you have a cluster of 10 gauges, and five of these gauges break down. You can still predict the remaining five gauges. And you need not have to replace immediately. And you can predict nine to 12 months very easily. And the prediction accuracy will not go down. Can we shift to the slides? So this is somebody I [? pointed ?] out. We created cluster of gauges. Then we applied machine learning. We were able to build the models that can very accurately predict the water flow. whether you're building predictive maintenance for an oil rig, or for a aircraft, or for a river, if you can collect the data using your IoT device-- so in this case, within Google, we used Google Cloud IoT Core. And we used [? PubServ. ?] Those are our IoT component. Then we used data flow and BigQuery for data processing and storage. And we used Cloud ML for the model building. This is a reference architecture you can pretty much use in most of the use cases where you have a IoT data, and you want to build predictive maintenance solution. We didn't get the time to talk about edge. In many cases, you want to build predictive maintenance on the edge also. So we have a TensorFlow Lite. You can take your TensorFlow or Cloud ML model, and you can compress it very easily. And once you compress it, the model becomes lighter. And you can deploy it on an edge. So that's all we have for today. So in summary, if you have a IoT data, discuss the use case with business, agree the metrics, then do the data exploration to determine your data set and use cases batches. And do a checkpoint. Once you past the checkpoint, build a predictive maintenance use case. If predictive maintenance use case is not right, think about predictive monitoring use case. And then if it meets the value prop, then build either of these. And focus on creating labels. Focus on creating features and, finally, training model. And you will be able to build an end-to-end solution. [MUSIC PLAYING]

Info

Channel: Google Workspace

Views: 40,484

Rating: undefined out of 5

Keywords: type: Conference Talk (Full production);, pr_pr: Google Cloud Next, purpose: Educate

Id: 4wneZDEB3VA

Channel Id: undefined

Length: 46min 39sec (2799 seconds)

Published: Tue Jul 24 2018