Essential Tools for Machine Learning - MATLAB Video

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello and welcome to this webinar on machine learning my name is Shanna Patil and I'm a technical product manager of networks working on products in the area of statistics machine learning and deep learning practice earning math works I was working on developing algorithms for human health monitoring by applying signal processing and machine learning techniques to data gathered using variable sensors as the title suggests in this webinar we will talk about some essential tools that will enable you to quickly develop and deploy robust machine learning models using MATLAB machine learning has become quite pervasive today here are some examples of applications for machine learning as enabled performance that would have been very difficult to achieve the traditional modeling techniques let's take a look at the characteristics of these problems that make them such a good fit for machine learning techniques problems such as speech recognition object recognition and engine health monitoring are too complex for handwritten rules or equations machine learning algorithms are designed to learn such complex nonlinear relationships weather forecasting energy load forecasting and stock market predictions are examples of applications where the underlying system is constantly changing so the solution also needs to change and adapt machine learning algorithms can quickly learn from new data and thus keep up with such dynamic systems applications like IOT analytics taxi availability analysis and airline slide dealer predictions typically involve learning from large amounts of data machine learning algorithms are designed to learn efficiently from such large data sets in this webinar we will focus on one such application we are applying machine learning to develop a solution is a good fit using a real-world data set we will use the machine learning workflow to go from raw sensor data to real-time classification and hard sounds as shown in this video along the way we will identify several essential tools that enable us to quick develop and deploy robust machine learning solutions here is the agenda for this webinar we will start with a quick overview of machine learning and talk about typical elements of machine learning workflow we will then work to an example in which we will develop a solution based on machine learning techniques to classify heart sounds finally we will review some of the key challenges and talk about how tools available in MATLAB help address them in simple terms machine learning uses data and produces a program to perform a task this ability is especially useful for complex problems the closed form solutions might be too difficult or even impossible to derive let's take the example of human activity detection using a smart phone our goal here is to use the internal tri-axial accelerometer to determine when the person is performing activities like sitting standing and walking if it is the traditional approach we might be able to come up with some handwritten rules based on empirical observations or we might try to develop a formula in either case the solution will be suboptimal as the problem is quite complex and the solution space has many combinatorial possibilities on the other hand a machine learning solution would involve collection of large amounts of sensor data with corresponding activity levels and then allowing the algorithm to learn that these complex nonlinear relationships between inputs and outputs depending on the complexity of the problem we can choose to use a simple algorithm for example an IU base or go with something more complex like support vector machines our task here is to select the most appropriate technique and avoid pitfalls such as overfitting machine learning can be broadly divided into supervised and unsupervised learning in supervised learning the data set includes input values as well as corresponding outputs and our task here is to train a model to predict or estimate the output on new data if the output has discrete values we refer to it as a classification problem whereas if the output is continuous we refer to it as a regression problem for example if you are trying to predict tumor size into small medium or large then it is a classification problem whereas if you are trying to predict electricity demand from a grid say in kilowatt hours then it is a regression problem in case of unsupervised learning our task is to group the data based on some measure of similarity input data set does not have corresponding output or labels so many different solutions are possible and our goal here is to discover naturally occurring patterns in the data clustering techniques fall under the category of unsupervised learning an example of unsupervised learning is a problem of determining optimal placement of antennas so that all cellular customers receive good service here are some examples of techniques under each of these categories support vector machines and discriminant analysis are examples of classification techniques linear regression and regression trees are examples of regression techniques whereas k-means and Gaussian mistreat mixture models are examples of clustering techniques let's talk about the steps involved in a typical machine learning workflow accessing and exploring the data is almost always the first step this step involves reading data from different sources like various file formats databases or even streaming data from sensors the first step is usually followed by the pre-processing step in this step raw data is manipulated and transformed for consumption than machine learning algorithms this involves cleaning of messy data example outliers or missing values extracting features or predictors by using domain knowledge for example image features and applying various data reduction and transformation techniques to keep only the most relevant information once the data is pre processed it is ready for the third step this step involves the development of predictive models in this step we have a lot of flexibility with respect to the learning techniques we want to use as well as the greatest parameters of that technique typical tasks include training and comparing multiple models optimizing model parameters and validating model performance to ensure robustness the final step is to share and deploy train models by integrating them into analytic pipelines models may run on embedded systems shared as standalone applications or even deploy on the cloud the goal here is to rapidly build and deploy accurate models now let's work through an example that will help us identify some of the pitfalls in this workflow as well as tools that can enable us to deal with them effectively this example is based on hard sums data set that was used in the 2016 fishnet and computing and cardiology challenge five sounds contain two edge information that can enable early diagnosis of serious pathologies of the cardiac system because listening to heart sounds requires a simple device it is an ideal screening tool that can be used in the field to identify and defer at-risk individuals for further testing however the heart sound signal is highly complex and reliable diagnosis often requires a trained clinician in this example our objective is to develop an algorithm that can be used for automatic classification of heart sounds with the goal of deploying the final solution on an embedded system for real-time diagnosis the data set consists of a training and validation set the training set set consists of 3,240 recordings and the validation set consists of about 301 recordings the recordings are very variable ends ranging from 5 seconds to more than 120 seconds each recording comes with a label of either a normal or abnormal in the heart sounds example we will focus on a subset of elements from the workflow that we reviewed earlier our data set consists of files with recordings of sensor data in the pre-processing step we will focus on feature extraction and feature selection to identify the most relevant features to develop our predictive model we will train and compare multiple models perform parameter optimization and validate our model on new data finally we will take our machine learning solution and prepare it for deployment to an embedded system throughout this example we will use MATLAB as a programming language in the pre-processing step we will rely on the capabilities provided by the signal processing box and the wavelet tool box for feature extraction and the statistics on machine learning to the box for feature selection we will use apps and algorithms from the statistics and machine learning toolbox for developing predictive models and use the MATLAB coder to generate C code from MATLAB code for deployment we will use the live editor to work through this example for those of you who are not familiar with this feature live editor is an interactive way to write code and generate computational narratives that are easy to share and communicate in the hard science data set each recording is an audio file with a normal or abnormal label let's start by plotting and listening to what an abnormal heart sounds like MATLAB provides tools like audio read and audio player which make it easy to work with such audio files now let's listen to what a normal heart sounds like okay that was a bit quick so let's let's play them again so here is the abnormal heart and here is a normal heart while they are not experts we certainly notice some differences the abnormal heart seems to have higher frequencies and a noise like quality in between beats the normal heart on the other hand is more regular with silence between beats now that we have some idea of what the signals look like and sound like let's take a look at the signal in the frequency domain the signal processing toolbox why is an app called the signal analyzer that enables us to do this without writing any code so let's open up the app in the signal analyzer app we can take a look at the two signals side-by-side so on the left side we have the normal heart sound signal and on the right side we have that normal heart sound signal we can clearly notice some differences as I saw earlier in the live editor now we can take a look at the the power spectrum for both of these these signals we can immediately notice that there is there is a presence of some higher frequencies in the case of that normal heart sounds we can also use the banner feature which allows us to take a look at a short segment of the signal at a time and essentially kind of go across the signal and see what changes if there are any changes across the power spectrum as we as we pan across the signal again we kind of observed this presence of some higher frequencies for for that normal heart sound signal so now that we have some idea of what the signal looks like in the frequency domain let's go back to the live editor and work on one feature extraction next let's start preparing for the pre-processing step let's take a look at how the files are organized for this data set the data set that we are using in this example includes more than three thousand files spread across multiple folders the two main folders in this data set are training and validation under the training folder we have multiple folders from A to F each of which contains several hundred files corresponding to two individual hard sound recordings this folder also contains the reference file which includes the labels associated with each of these recordings so rather than writing code to navigate through these folders for accessing the data files we will create a file data store object a-five datastore creates a collection of files and uses a read function to load the file sequentially into memory for processing this enables us to work with large collections of files without worrying about the number organization or naming of the files next we will read the reference files can create a table with file names and corresponding tables we will use this reference table during feature extraction to build the feature table so we now move from the access and explore stage to the pre-processing stage in this step we will process and extract a set of features from the raw heart sound signal using five-second non-overlapping videos bulk of the computation here happens in this extract features function let's take a look at it in order to capture as much relevant information as possible we extract a large set of about 71 features from each window the extracted features include summary statistics like mean median and standard deviation features from the frequency domain like dominant frequency and spectral entropy as well as features extracted using wavelet analysis we also extract features from the speech processing domain called the mel frequency sub sub coefficients all these features capture information from different perspectives and they may or may not be relevant to our problem so let's go back to the live editor and extract its features from our data set depending on how many features you are extracting and the size of your data set each extraction can be a time-consuming step in order to speed up feature extraction we can use power 4 from the parallel computing toolbox to distribute computations across all available course in the locally or on a computer cluster for this example I have calculated the features earlier so a skip feature extraction and load the pre calculated feature set let's take a look at let's take a look at the features here we can see the first five observations from the from the features table so the table is organized as the the columns corresponding to different features or predictors and rows corresponding to individual observations for each observation the final column corresponds to the class which could be abnormal or normal our next step is to quickly train and compare multiple classification algorithms the classification learner app has been designed exactly for this purpose so let's launch the app and take a look in the classification learn app we can start a new session the first step is to select the table or matrix to work with so in this case we are going to use the feature stable the second step is to assign the variables with as either predictors responses or ignore them by assigning them as to model board in this example we are going to use all the variables and assign the last variable which is class as response the next step is to select a validation strategy because we have lots of observations in this example we are going to use the holdout validation with 50% of the data for training and 50% for testing so let's start the session the first view that we get from the classification learner is a scatter plot the scatter plot has one feature on the x-axis the other feature on the y-axis and the points in the scatter plot are labeled by the different classes so this is a good view for us to quickly look at our features and and see which features might carry some useful information when you have lots of features like Indic in this case it is it is hard to actually look at all the possible combinations this is a very high dimensional data set so instead of exploring this data set in a squat scatter plot let's start training some classifiers so in the classification our app we have access to a number of different classifiers so you have access to decision trees logistic regression support vector machines nearest neighbor classifiers as well as in samples of of different classifiers so the other nice feature of the classification of our app is that you can select to again select these classifiers individually or you can choose to train multiple classifier sama traces for example I can select all support vector machines and use the ability of the classification or I've trained these classifiers in parallel to quickly train and compare their performance so I have access to 12 cores on a MathWorks cluster and I can use all of them to train train my classifier class wise pretty quickly so here the training results show that the Kubica SVM as well as the Gaussian and quadratic ESRI and seem to give good performance so let's take a look at what this performance numbers look like so if we take a look at the confusion matrix this tells us that the MIS classification break for the abnormal class is significantly higher compared to the normal class and this is most likely because of the imbalance between the number of observations so we can see that the number of observations for normal class is significantly higher than the normal class which increases the likelihood that our classifier is is biased the other aspect to consider here is that in the problem that we are working on it might be better to miss classify normals instead of misclassifying abnormals that is it's better to to be more accurate at classifying abnormals at the cost of misclassifying normals as abnormals because this is a screening tool and consequence of not screening someone with abnormal is significantly higher compared to screening some Lewis normal and then figuring out that they don't have they don't have medical condition the ROC curve is another useful tool for processing performance so here we get get an idea of the trade-off between true positives and false positives in this example as I said earlier we might be ok with with a higher false false positive rate for the normal toss in the fit in favor of increasing the true positive rate so let's let's go back to the library and dig deeper into some of these issues in order to train the classifier programmatically let's first split the data set into training and testing with a 50/50 ratio function CV partition makes it easy to create such splits if we take a look at the number of observations for normal and abnormal classes we can clearly see the imbalance that could lead to a bias classifier this is a common challenge when it comes to supervised learning so let's now train a classifier and try to really take into account this data imbalance issue so to come for data imbalance we can assign higher miss classification cost associated with the class with fewer observations so in our examples we are interested in maximizing the detection of true positives for the abnormal class so we will assign a 10x cost with its classification in other words we are going to tell the classifier that it is okay if someone who is normal gets miss classified as abnormal rather than the other way around however if the false positive rate is too high the examination will become useless so we must try to achieve a good balance between the higher accuracy for the abnormal class and tolerance for for false positives for the normal class another challenge associated with classifier training is with hyper open returning most classifiers have one or more parameters that can be tuned to achieve a better fit in the case of support vector machines one can tune box constrained kernel scale and kernel function tuning parameters manually can be a tedious process and lead to suboptimal values on the other hand exhausting click search approaches can be time-consuming and slow down the iterative process of machine learning MATLAB provides bayesian optimization based method for for hyper parameter tuning that is tightly coupled with the fit function API so by specifying name value parameters in the training function the method efficiently searches through the parameter space and finds the optimal values for the hyper parameters so let's take a look at how hyper parameter tuning works Bayesian optimization the method works by building a Bayesian model of the function that it is trying to minimize and and uses it to select the next point for evaluating evaluating the function this approach allows the method to quickly search the parameter space to identify optimal parameter values this method is specifically designed to minimize the number of function evaluations and is very efficient compared to grid search or other standard optimization techniques so as you can see here the method has completed all the iterations and has it has found optimal values for box constrain and kernel scale which it has automatically selected for for the classifier so let's go back to the live editor and look at the performance of a classifier that has been trained with the misclassification cost and hyper parameter tuning I have saved previously in classifier so instead of training training it here I'm just going to load that classifier and look at its performance so that's let's see so so here is a confusion matrix as a heat map and we can see that by taking into account the miss classification because we have we have been able to significantly increase the accuracy for the abnormal class however this has come at a cost for for for the normal class and we are now significantly misclassifying normals as as it mammals let's see what we can now we can we can do about this so a potential reason for poor performance of your classifier could be overfitting one approach for dealing with overfitting is to reduce the number of parameters or features by performing feature selection feature selection is an important step in the machine learning workflow and it results not only in savings in terms of computational cost and storage requirements but also results in simpler models that are very less likely to over fit in this example we extract 71 features so there is a potential that there might be features which are either redundant or or or not getting any any useful information so let's go back to the pre-processing step and use the feature selection technique called neighborhood component analysis or nca for short nca is a powerful technique for feature selection as it is able to handle very high dimensional data sets significantly higher higher dimensions than what we are dealing with as well as datasets that are extremely large with with lots and lots of observations the other factor here is that you have as a user you have control over the lambda parameter which is the tag utilization parameter and it allows you to control the sparsity and as well as minimize the redundancy in the selected features by increasing the value value of lambda so let's let's run through the feature selection on our data set and see which which features are I selected so here the nca method went through our features and identified out of the core of the 71 features that identified only 11 features that are they're carrying carrying useful useful information so now let's go ahead and clean our classification model with only the 11 selected features so we'll use the selected features and also perform hyper parameter optimization to train a new classifier so here again as before I have saved a previously claimed classification model trained on the selected features and I'm going to simply load and look at the performance of this of this classifier so if you take a look at the performance the performance on that normal class has improved slightly but there is a significant improvement in terms of type of performance on the on the normal class this indicates that there was a potential for overfitting which was addressed by the feature selection step so at this point in our analysis we might be happy with the performance that we're getting from the trained classifier our trained classifier is able to accurately detect more than 98% of their normal cases and almost 75% of the normal cases the 25% misclassification rate for normal cases might be okay as this is just a screening test and these cases will be further evaluated by experts the next step for us is to deploy our analytics because the goal is to use hard sounds for diagnosis in the field one deployment solution might be to deploy analytics to an embedded system which can process data and provide results immediately Madeline provides support for code generation that makes it easy to automatically generate code for deployment of various embedded targets let's take a look at the process the process consists of three steps first we save the Train model using save compact model method next we create an entry point function in this case called classify hard sounds click classify hard sounds is a function that takes your sensor data as input and produces the classification label of normal and abnormal after processing the raw sensor data data to feature extraction and the train classification model the final step is to use the coder app to automatically generate the C code so let's take a look at what are the coder app looks like and the steps involved the bat encoder app provides a guided step-by-step process to generate stand-alone C and C++ code from MATLAB code the first step is to select the entry point function the next step is to define input types that can do this automatically by evaluating the entry point function for this example we don't want the signal to be a fixed length next the app checks for any runtime issues and prompts you to fix parts and replace incompatible functions looks like we did not have any issues the final step is to generate C or C++ code the app is successfully generated C code which can now be deployed to various embedded targets the final step is to validate the generated C code by processing files from the validation set this can be done using the Mex file as an interface between MATLAB and C code in this plot we can see the various hard sound signals as well as their actual and predicted labels similar to the performance on our testing set most of the errors are associated with miss classification of normal as abnormal let's review some of the challenges that we encounter during the workflow for example data access and exploration often involves working with data that comes in all shapes and sizes these lower data sets are messy and not always tabular pre-processing data requires domain-specific algorithms and tools for example signal or image processing algorithms are required to extract useful features from signal and image data when dealing with high dimensional data sets selecting the right set of features is often important to avoid overfitting issues developing and selecting predicting models that generalize well requires training and comparing multiple algorithms algorithms often require careful parameter tuning which can be a time-consuming process integration and deploy deployment of analytics often requires translating or interfacing with different hardware and software platforms this can be a bottleneck for applications that require frequent model updates finally machine learning workflows are never in a convenient linear fashion we have constantly to go back and forth and iterate and try different ideas for the first challenge of data diversity we saw how MATLAB can work with different types of data MATLAB can also work directly with financial data feeds text geospatial data and several other data formats MATLAB is high quality libraries of industry standard algorithms and functions that can enable feature engineering without requiring domain expertise MATLAB also provides tools for evaluating and selecting the right set of features for both tall and wide data sets with interactive abdill and workflows we can quickly train and compare models so the focus can be on machine learning and not programming and debugging tight integration of parameter tuning techniques enables development of models with optimal parameter values MATLAB enables deployment of analytics on a broad range of platforms ranging from standalone apps to enterprise scale systems code generation makes it easy to produce stand-alone code that can be rapidly deployed to embedded systems finally MATLAB is an inherently flexible modeling environment and a complete programming language with no restrictions to the customizations you can make to your analysis this makes MATLAB an excellent platform for machine learning to summarize errors and key takeaways about how tools available in MATLAB and it will you to rapidly develop and deploy machine learning models if you are interested in learning more please take a look at the product documentation there are plenty of examples and gossip pages that can help you master the tools to learn more about algorithms application areas examples and webinars predicted to machine learning please feel free to visit the machine learning page that brings us to the end of the webinar thanks for joining us today in closing I'm going to pause for a few minutes while we gather questions if you have any questions you want to ask please enter them in the Q&A panel now
Info
Channel: MATLAB
Views: 107,369
Rating: undefined out of 5
Keywords: MATLAB, Simulink, MathWorks
Id: k_BrPj3TcTE
Channel Id: undefined
Length: 35min 59sec (2159 seconds)
Published: Wed Apr 12 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.