we'll be starting just as soon as we've uh got the the YouTube live Channel up and running as well we're having a few technical issues apparently okay great uh it looks like we're all connected so welcome everybody to this uh webinar uh I'm John mcloon I lead the technical services team at wol from research here in Europe uh where we do projects for customers using uh war from technology very often data science projects so today I'm going to be talking about getting started with AI the idea is that it's going to be an introductory uh uh session that we'll walk through some of the very basic concepts from tools that happen to use AI within the wol from language right up to the modern generative AI with things like chat GPT uh and I'm going to try and cover that in the space of an hour you can ask uh questions uh those of you who are watching on big marker you can ask through the chat panel on the right hand side uh if you're watching on YouTube live then uh just post comments into the uh into the comment section uh below below the video uh I will probably take all of those questions at the end of the presentation which will take probably about an hour um if I see the questions beforehand I might take them early but I'm not very good at watching the question Channel and talking at the same time so let's share a screen here uh so you can see what I'm seeing and we'll get started um so right hopefully that's sharing now um so what we're going to cover in this session is some uh some complete basics of machine Lear what even what is machine learning and how does that lead to AI type um decisions um we're going to go through the basics of Automated machine learning in the wol from language and um we're also um going to look at the process by which that goes through how you extract features classify predict and test what you've done and then we'll get into some more advanced newal networks and at the end we'll touch on generative Ai and things like GPT now um as I said this is me introduction so some of you I know will be Waring language users already many of you not um uh so I'm going to introduce a few basic concepts of machine learning first of all the types of task and then a little bit of uh basic syntax just a minimum for you to be able to understand what's going on so the first question is what do we even been mean by Machine learning and how that sort of gets muddled up with uh artificial intelligence um I mean AI of course is a very um um vague for what we've been doing for 30 years at wol was called AI back in the early days when computers were able for the first time to do things like symbolic integration or solve equations in a way that humans used to do on pencil and paper it was called AI these days it uh tends to be applied to things that seem to be fuzzy reasoning and linguistic type reasoning but underl all of those uh mechanisms are combinations of of pure logical programming and machine learning um so The Logical programming is all in the end human curated instructions that describe exactly how processes should work algorithmic type things like symbolic integration fit into that category but machine learning covers those areas where we are data rich and understanding poor where um we have lots of examples of what we want but we don't really quite know what the rules are and that's really been the underpinning of all the modern um view of AI in the last uh 10 or 20 years so what that is is taking data and building a model so where in the old days we might uh come up with an equation that describes the motion of planets and then use that model to predict the future now we're simply going to take data and get the computer to come up with a model so the idea is we have some data points and at its very simplest level you can see imagine uh fitting a curve to some points of data is the very simplest kind of machine learning because the parameters of that curve had to be figured out by the computer from by the machine from the data in a much more um um uh modern uh rendition we're taking things like images and then building a very complicated model um much more than an equational model that then can predict uh the outcome of whether things in this case be images or pictures of dogs or cats now machine learning breaks down into a whole collection of different classes but here's the the the most important ones um supervised learning is where you are giving the computer the supervision of knowing what uh what is known as ground truth what the correct answer are so in my previous example here if this was my input data that I'm providing to the computer the fact that I have labeled that picture as a picture of a cat is my supervision I'm telling it this is an example of a cat this is an example of a dog and that's the that's the human supervision um it's particularly useful of course when you have labeled data things like class recognition uh and predicting values uh where we have examples of what happens that we want to replicate unsupervised learning is the same thing but with without the labels so you are simply providing a collection of data but you don't really know what the true answer is You're simply trying to look for patterns so a typical thing that goes on in that space is to look for clusters to say people who bought products like you uh also bought these other products no human has said that you are a uh a ity Enthusiast middle-aged man but it's identified that you are part of a cohorts that has similar similar behavior and that's that's a form example of unsupervised learning and then there's a higher level which is reinforcement learning which is when you get the model to gain its own feedback from the environment so notable examples have been things like uh teaching computers to play go where rather than just feeding it millions of examples of go games it's is effectively playing itself and learning how to play through generating its own data for doing its own supervised and unsupervised learning and it's basically just closing a feedback loop on on the other two approaches so some of the typical tasks uh we have in this our classification can we say what kind of a thing what group it belongs to like whether pictures of one thing or another thing uh prediction is another important task which is given some inputs can we make some numerical prediction some quantitative prediction uh out and that might be a single variable like this where we're maybe predicting the height of a child based on age and sex but it could be multiple values that were predicting out but the difference is one's discret there are fixed categories cats and dogs or continuous categories that can be any value um other examples are things like um creating mappings a common mapping is a text translation where we can figure out how to take one space of data and map it to another space of data being the the other language for the same phrase um we can predict future time series we can try and predict things that don't look like they belong so almost the opposite of classification and prediction you're trying to spot outliers we're doing things like data clustering to try and form groups and cohorts that that might have similar Behavior or uh or could be treated in similar ways and more modern things are things like chatbots where I can have conversations with a computer and it's able to give me human like intelligence seeming responses so these are all underlying uh the mechanism in all of these is machine learning um in the end we're taking some data from The Real World to learn a model and we're putting some data in and we're getting some data out and that data can be any kind of data it can be very high-dimensional like images and video or very simple things like numbers now before I'm going to show lots of examples to have uh code but don't worry all of the uh code examples are extremely simple but I'm just going to give you a little bit of an overview of the wol from technology so you you are not distracted by the the notation um I'm working in a notebook um so inputs and outputs are in the document here um I'm not going to be working from the command line or running scripts so you'll just see outputs appear below inputs uh uh we have within that technology stack for w language all kinds of functionality that is different from machine learning um all kinds of computational language technology um but it is all accessed in the same way that we have commands that have a name so if I'm adding two and two the command might be called plus and then has arguments that are separated by commas and they have square brackets around them so when you see something that has this name square brackets that's just a function in the wol language and everything is a function uh we will see data in couple of forms the curly brackets are way of denoting denoting a list of numbers higher dimensional data is just lists of lists um so there's no sort of limits to the depth here we've got a list of lists of lists so we've got threedimensional data and it doesn't have threee rectangular be ragged like this example uh and we also have a key value uh notation um like a dictionary in python or a struct in other languages where we have data which has is not ordered like a list is where we got a first and second second element but it's it's labeled by by a key so the key here is name and the value is gy and the little brackets here to note that this is a key value pair so with that you've got essentially all you need to do hopefully to understand the code that we'll see so um let's get into the tools that are provided within the wolf language that make this uh hopefully very accessible for you to be able to um just get started on if you want to um follow along you can download The Notebook it's the link is in the the chat panel on the right hand side if you don't have Warf from language you can uh either download a a trial version on your desktop or you can open a free cloud account at Warf from cloud.com um and access it through a web browser um and that's free of charge although it's got some limits on uh on memory and CPU time and and retention of dat you store an outloud unless you unless you subscribe um but either way you should be able to play around with this after this webinar are free of charge so um there's a whole collection of commands there's 5,000 commands in the warf language we're going to focus on a few dozen today um uh the most notable ones we're going to look at first of all things like classify and predict which are the complete Automated machine learning commands uh cluster classify and find clusters if for finding clusters and we'll talk about the rest as we hit them um and the language here they're all very task oriented we'll notice they call things like classifi not not K nearest neighbor the algorithm is is a detail so we try and label things by what they're trying to do so the name of the function is is just key to understand the purpose we'll also look at a much richer and more complicated newal Network framework and we'll just lightly touch in on that for more expert users so um let's look at some quick examples of using some of these built-in commands first of all the classify command actually already comes with some pre-built machine learning classifiers so you can um use it with the um uh with the argument of the type of pre-ot classifier in with this case a language detection classifier and then the argument we giving us is this ring what is this language and we're asking for the probabilities of uh uh what's going to come out and it says the highest probility with that at 99.99% um is English and actually force it to tell me some of these obscure probabilities let's tell it give me the top three probabilities and um uh there's a tiny possibility it's a language I don't know called Tagalog Tagalog or it could also be Africans at an even smaller probability so this is it's making its class prediction if I don't ask it for these extra bits of detail it will simply make its prediction and say that looks like an English sentence and there's about half a dozen of these standard built-in classifiers um within the classify command things like profanity detection um um uh uh things like topic um and name gender detection things like this are pre-built in mostly where this becomes useful though is when you want to um uh build your own classifiers and we'll talk about that in a minute at a higher level we have a purpose built image identification function that switches between various different uh more advanced algorithms for identifying what's in pictures so here I have two images that I'm entering into the system I'm going to say apply that image identify function to each of those images and we recognize a grey wolf and some key limes now the new network framework that I mentioned is a way of being able to fetch arbitary neural networks from a repository so we can do this same image detection task by a specific net model that we've called Inception um it's I think a Google trained model and uh I've list mislabeled these things let's see what Google has to say about image one and two and um it's given the same answer on the first one but the second one it's uh um it's I would guess has got that wrong when they look like L to mean but maybe it knows better when these are unrip lemons other s of high level commands we have built into the language are things like image restyling if you're doing image processing being able to take um say um Source images like the Mona Lisa and then style reference images and say reyle the monaa as if it was done by Kandinsky and it's taken the style of one and match on to the other or mon version of the same thing another highle um um set of uh useful machine learning things that are built into the language are to do with natural language processing so those can be at quite structural levels so I can take a sentence here and decompose it um into what's the noun the subject the object um and separate sentences or at a much more component level where I can take in this case I'm fetching the Wikipedia page on the moon but I want to um uh search that text for all of the things that are mentioned that are of type quantity and I want to figure out what those um quantities look like so it's using a machine learning content recognizer to find the phrases that look like things of the quantities like 10 meters or 100 years uh and then interpret those things uh in the warf language representation of units so we're not going to get the word meters it's going to turn that into a stylized M automatically um and this text cases works over everything that the wolf from alpha system knows about so I could search that for cities that were mentioned in the text or countries or notable people um uh or Rivers if there were any mentioned on I doubt there any rivers mentioned on the Wikipedia part article about the moon but all of these entity Concepts on the real world we can recognize those from text and um hopefully this is going to not take too much longer here we go so here's all the quantities that were mentioned on that page together with the units that they're in um and another highle thing that's built in is anomaly detection and the idea of that is I can give it some data here in this case a set of colors and it's trying to figure out which doesn't look like it follows the pattern of the rest and since these are all rather secondary muted uh pastely Shades it's decided that bright red and bright green stand out as anomalous and this works even better if you can pre-train on data that you know to be correct so here I'm fetching um what am I doing with this one I'm fetching the some data from the Mist data set to build an omly detector function for characters and that's because the um the mes data contains pictures of handwritten digits and now I can apply that pre-train detector to a set of example images that um and it spotted the two things that don't look like uh they look like handwritten digits one because it clearly isn't and the other one because it's so corrupted that it um it doesn't fit the quality of the rest of the data set and um of course these are retrain so let's talk about the the process and if you're going to make your own uh the bits that you need to understand um the first thing which we're going to talk about actually in simple cases you will not have to concern yourself with but it's important as things get more advanced to understand the basics of the concept and that is here in these examples I've been jumping around all kinds of different things so I've dealt with things like images I've dealt with things like text I've dealt with things like um um well colors um and more images so very different kinds of data and the way that the machine learning Frameworks that we've built all work is that you first of all have to convert those into a vector of numbers and uh that effect of numbers is going to be the features that we are going to look for patterns in now if the thing is already um in uh in numerical form then of course you're already there the numbers are or effect of numbers is is is the features already but very often we have to do some level of feature extraction in order to turn something into um uh into a vector that we can then process and we deal with this at a high level with this feature extract commands there's a choice of algorithms behind and lots of details that you can control but the basic idea of this is to take some samples of data so I here I've got a string in color and a um and a date uh and we got several examples of that and it's going to examine those and convert those into um into a simple feature Vector of three numbers to represent each now this is in I normally when you're doing machine learning you're dealing with quite large amounts of data so proper patterns can be found I've given it hardly any information here and so it's done some rather naive things it's decided that this is a phrase um uh um of some kind and it's m that to a number that's clearly one number is never going to represent the entire um richness of um uh of um of a you know human language um and in fact this number doesn't neily respond correspond to this this number is the first ve feature of a set of three vectors that represents the whole thing so it's taken that whole set and map that to some kind of space that is in three dimensions and then um converted that to a number now these things start to become meaningful when you have larger amounts of data but we accelerate the process by in when you have small data by leaning on some pre-trained feature extractors that make these things make sense and the idea is that the feature extract can work over any kind of data so if I give it this sort of key value data set then here it's converted these things into um uh into a set of pairs of numbers or I've taking these much more complex things of graphs and it's turned those each into a vector of in this case it's decided six numbers sums up the uh the number um now it's it's kind of useful to imagine that these things represent something that is descriptive so you could imagine for example in this uh um graph maybe this first number is uh something like how dense it is this would have a higher number than this one or it's how um pointed is or how connected it is or um how Central the most Central Point is or something that we might have words for one of the things you have to accept fairly early with machine learning is that it's inventing its own Concepts so while we might look at a picture and say that's a cat because it's got Point years point is is our version of extracting a feature in its uh world it's going to invent some concept which might you know best be described as zigzaggy splud or or kind of prickly jagg or something but it's a concept that it's invented that may not even have a word that we would even recognize when we see the patterns and it's building this this language for itself that it then ascribes numbers to so these things um are important but not very easy to understand so um one of those ways we can understand them a little bit better sometimes is to try and visualize them because sometimes that can give you a sense of what the thing what the extraction means and I also want to show you what this looks like when it goes to bit more scale so let's do something now that's a kind of called a semi-real set of data to it and I should run this straight away because it starts we'll take sex a minute to run um I'll apologize in advance here I'm doing live web image searching so if the internet throws back anything inappropriate for me I apologize in advance but what I'm searching for here is three characters uh from the Guardians of the Galaxy franchise Rocket Raccoon brute or sorry four characters Gamora and Star-Lord and I'm doing a web IM research um to get thumbnail pictures just to keep the um speed of this uh the amount of data small for this demo example I'm going to get 20 uh images of each so in a few seconds hopefully we will have generated 80 fairly random images that match the image search we might find that uh there were some things that are not correctly found from Google that uh that maybe instead of Rocket Raccoon from Guardians of the Galaxy if there's a picture of uh raccoons in then they might come up but if you have enough data those kinds of things tend to to average out so I think we are done let's fetch those so let's have a quick look at what we've got here so here's our data set um there's Rocket Raccoon some groots and and so on so let's hide those again we don't need to see those in this case I am going to um uh uh oh actually this was meant to generated a random sample I was going to show but um um so here is my random sample of just no actually so randomly sample the whole lot so it's reordered them uh in order to be um uh not in an obvious pattern order and we're going to do feature extraction uh on those now in this case it's going to start doing not just how do I map that to some numbers it's going to also start doing a certain amount of um uh feature reduction so we're going to accelerate this feature extraction with things we've learned from looking at millions of other images so it knows to look for things that are kind of say pointy or Square or round or you know things that turn out to be significant features in the real world or certain textures and then it's also going to reduce that and the idea of of Dimension reduction is that we're trying to get fewer numbers that represent the essence of what we've got so if in fact every one of my images has in it the same texture at the same value there is no point keeping that number because it's not descriptive of the differences between uh images and in another context if you imagine maybe I was trying to predict what kind of breed a dog it was by its height weight and length and color you might find that that height and length because dogs are basically in the same aspect ratio once you know one of them you know the other so there's only it's not worth retaining both numbers it might be enough simply to keep the length of the dog because the height is is is is kind of inferable from uh from the length in practice reduction uh takes the whole space and squashes it down to lower space combining um different features together so it's it's nice to think of it as throwing out the numbers that aren't very useful but actually it's it's doing something a little bit more complex but the basic idea is that we can sum up these um images which are all I don't know 100 pixels by 100 pixels so there's 10,000 values in each um well I guess 30,000 vales because they're in color and we have squashes down now to a vector which is um let's just find out how long that Vector was a vector of 78 numbers and those 78 numbers are starting to learn what's important to sum up this picture compared to the other pictures that we saw uh in the data so one thing we could do for example with that is um is they do the same process but build a nearest function that retains every image together with its its extracted feature value and now when I take a new picture I can find the feature Vector the 78 numbers and I can find the vector in that list that is closest to those 78 numbers and that point in that space corresponds to the picture that has the the most similar set of V numbers within that feature space if we're extracting useful uh features and reducing the Dimensions appropriately what we find is we get something that looks similar it has it's not similar in the same it's facing the same direction but it's got the same kind of features in it than um uh than our our sample set and again I'm saying here what's the nearest to this picture so these are the two that are closest in that that Vector space of features and now you can start seeing how um that uh that you know these rather abstract numbers start to become useful because we now have a measure of how close something is to something else we can find the nearest thing we can start finding clusters of things that are near to each other and all of these things come out of this idea of reduced feature um um Vector uh for the content now one very useful thing that I do as a kind of first step is we take this to the extreme and we take uh um those pictures and we squash them down to a feature space of two numbers and treat those as XY coordinates and throw the pictures onto uh plot using those XY numbers and when you as a first step with any machine learning thing it usually gives me some sense of plausibility of whether um there is something useful to be found in those features or whether the features are just irrelevant you know if uh if all we took out was uh a feature that was the uh the you know Corner pix top right corner pixel that might turn out to be completely irrelevant uh and what we can see at a glance here is we start getting clusters forming quite naturally where we can see down here here's a whole collection of raccoons together we've got a whole collection of groots together over here um uh I'm not seeing star-lords in the group but I'm guessing they were in there somewhere uh it seems like we've got a lot of things in cartoon form together so when you start seeing clusters forming in a feature space plot or things that you can recognize as being similar appearing close to each other that's a good sign that this approach is going to work um this thing itself is not terribly useful except as a qualitative measure of are our features doing something useful uh but it's a good first step and all that's actually doing is taking our Vector space and squashing it down to two dimensions and plotting the points labeling them right let's go through the whole process now um that what we need to do um in order to do a typical machine learning process from beginning to end and we're going to do classification here um on those images is first of all we have to prepare the data I'm going to largely skip that out um because all data science projects end up having to deal with the mess of the real world your data's been badly entered it's been labeled badly and you may have to do some massaging of the data to to get the labeling right um or to throw out bad values and things that don't belong and deciding whether the data was fundamentally biased and whether there's some subset of or resampling of that that will deal with how badly it was collected and all of these sort of background things um one key part though is you should be splitting it into a training set and a test set so you know just like the way we deal with exams in school we we want to see if somebody's understood their subject by giving them an exam there is no point giving them an exam on the exact question answers they've just revised you have to give them questions that are slightly different or not because you don't want to test whether it was memorized a particular value you want to figure out whether they can generalize what they've learned to something that they haven't seen so when we get to testing whether this uh our work has worked we want to make sure we've not shown the classifier the exam before we run it so we hold back some of the data typically 10% um would be or 5% would be a good amount to hold back we're then going to create the classifier which is going through the process of of converting the um um The Source material to vectors training a classifier um and then storing the train classifier uh and then we're going to test that train classifier on our holdback test data and measure how well it works and if it works it's good if it doesn't it's not this isn't really about checking your work in this world we have to kind of we can't be sure what or why it's making its decisions so we really have to base our our confidence in it on testing whether it appears to do the right thing um in in useful situations so um I'm going to take um that data and do the prep step so I've paired up the uh the character images with the labels that we searched on all the time because we're going to do this supervised learning here we're going to give it the answers to um the question the question is what is this the answer is Rocket Raccoon and then we're going to do our split we're going to split this into uh the first 60 elements are going to go into the training set and the last um I guess 20 elements are going to go into the test set and now we're going to train uh based on our training set I said for when things are simple and by simple I sort of mean um things with a one-dimensional output and particularly um where we have a feature extractor that is already built into the system so that's things like images text time series sounds um um uh colors graphs um video you know these sort of standard data types we have feature extractors already built all we have to do is say classify and you'll see here that it's uh automatically figured out that the inputs were images that there were four different classes of output and there's some hidden things going on here where it's decided the algorithm it's going to use it's chosen that automatically so classify has access to about eight standard machine learning approaches it's decided here logistic regression is the most appropriate and it's trained on 60 examples and then we've stored that with a name so I can refer to it later now we're going to take our test value that uh it hasn't seen well hopefully this one it hasn't seen um um and say apply that classifier to the thing and it's got be right um and you can also give it the probabilities argument here to say what does it think this one is well it thinks that near 100% um is Star-Lord and and effectively no chance that it's any of the others now it's important to take make distinction here that probabilities here is really a measure of the classifier's confidence not a measure of the probability that the answer is correct if we want to know the latter we need to do further calibration on some of the Unseen data to convert these numbers into um uh into probabilities of correctness because it could be that the the classifier simply hasn't seen enough examples to understand that there's ambiguity between um different things or it's possible that it's actually seen this example before which would make it very confidence as well um hopefully not I could test but um it doesn't matter too much um now we can take our our kind of um test set and um U and I've asked for the first five examples on the test set and see how those go so we know that these weren't in the original unless there was possibility of duplicate images which I didn't dup for which is part of the cleanup one might want to consider but here it's got this one WR we can see that is not starlord uh that one has got right this one has got wrong that one has got right and that one has got wrong so it's done pretty poorly this time um it uh there is a little bit of non-determinism in the training and obviously very much non-determinism in the images I've got when I play with this example normally it gets about four outs of five right and if we if I try example six through to 10 it 15 it might do better let's give it another chance correct can't tell um yeah not sure what that's meant to be uh correct correct correct correct correct I think correct wrong correct so it's uh it's done a a better job on those examples but we don't want to be anecdotal about this we want to um um be systematic and so the W language provides a whole collection of tools to allow us to take that classifier we built the test set that we held back that it hasn't seen and go systematically through every element of th that test set calculate what the model says it is compare it to what we know it is because the test set had truth values attached assuming those were correct and we get back a set of uh of important metrics here um we can access these individually but actually the key ones all appear on this main panel when we get it back um so the headline thing here is 75% accuracy so that means that three quarters of the um of the test examples it got right now that's very contextual there's two things you have to think about to way up is that a good answer one is if you knew nothing uh then how many would it get right well there were five classes and they were equally sampled so if you just said Groot every time you'll be right about 20% of the time so we're trying sorry yeah 20% of the time so we are trying to beat 20% here well 75% is really obviously much better than 20% and the other contextual thing that we need to think about is um uh is what's the purpose we're going to put this to uh if we're trying to um suggest tags for an image that seems pretty useful if this is uh a self-drive car having a 5% failure rate is a disaster so we have to think about costs and benefits of using a predictor now actually it's decided its accuracy Baseline is 35% I'd have to think about why that is not 20 that's the comparison it thinks we should be making um and there's there's actually about 40 different measures if you want to get technical but the next most important one is this um is this confusion Matrix plot me just make this a bit bigger um so we can look at it better oops cap socks that doesn't help um so this tells us the pattern of failures um what we've got is the actual classes on the left and the predicted classes along the top so what we want to see ideally is everything on the diagonal that whenever it's actually Gamora it predicts Gamora um and we can see here that for two of the classes uh we've got um Lots going on on the diagonal uh um that Star-Lord uh when it actually is starlord it gets it right every single time although it slightly overpredicts starlord because when it uh predicts starlord actually six out of eight times it was correct it's sometimes there were things that weren't starlord so this gives a sense of um of the nature of failures because sometimes you get very systematic failures and you find that uh that um you know there's a particular confusion between two different things and that either tells you again from a cost benefit point of view does that matter well it depends what the failure is uh if it's more likely to do something safe uh by accident than something dangerous then you may wish to ignore it um but it also can guide our future data collection that if it's always getting confused between starlord and kamora U then we need examples of those two classes that to help it learn the distinction and we don't need too many examples of classes where it gets things right um we can also sometimes look for things like the uh things that it was least confident on or all the things that was most confident on um uh in case that helps to tell us that in fact the dat's been polluted for example if there's a lot of worst classified examples and we look at them and go actually those don't belong in the data set at all that can send us back to um improving our data collection techniques so um I said that were this classified command has a whole bunch of different methods um so that rais a question of which does it use and and when should we should use different ones um here I've got a just as a fresh example some um some numbers that mapped colors I don't know why that uh that maybe this is something like the U geolocation on the world and that's the dominant color of the flag or something like that and we want to make some kind of prediction what we get is by default is we have method go to automatic you don't have to say it when it's automatic but say if I want to say it explicitly that's the default value when we use classify which is why it's decided on its own to do something um in this case naive basian now we can control that um if we want to by saying okay I don't want that to be automatic or naive basian I'm going to use uh say um gradient boosted trees as a method and now we've forced it to use a different method which will have different characteristics and if we look at um I have made this a little bit big on screen let's make that a bit smaller um from a set of original data um we're trying to fill in the spaces with our predictor you can see that depending on the method use you get quite different behaviors of what happens as you cross from one area to another that nearest neighbors is basically saying if we're looking here we look at the things that are nearest and they're all blue therefore it's blue logistic aggression is fitting uh kind of trend curves to it so um it's trying to find lines that kind of switch in a logistic curve in a in a true false and smooth kind of way uh lines across data that switch between zones support Vector machine uh I'm trying to remember the details of um uh I forget the details random Forest is um building decision trees and uh uh in random orders and uh then seeing how many of them make good predictions and discarding the poor ones naive Bay treats the two variables is variables somewhat independent that um uh uh like spam filters work very much like this if I get uh um uh email that says uh buy cheap Viagra VRA is a negative word cheap is a negative word for spam uh if I get email that say Mathematica or um or webinar those are positive words but it treats them independently so if I had a webinar about viagra it would those things would cancel out is the night based approach anual networks is the kind of um the richest approach which we'll spend the last uh third of the webinar on so the way you choose the method uh I'm going to skip through this example I think otherwise I'm in fear of massively overrunning on time is basically uh um to try each one possibly on part of the data if you're have very large amounts of data and then use these measure approaches to um decide which one is doing best and in fact that's effectively what um classify does when use method goes to automatic is it starts to train a little bit of the data on on each method measures their performance and gra discards the methods they performing poorly until it ends up with one method left that seems to be the winner and then it puts all of its training effort in into that um for this small training set it probably trains all of them completely but for a large training set it will do that with a A reduced sample until um until it does better and even all the sub parameters because all of these methods here like uh radiant booster trees has half a dozen parameters I can say which is a boosting rate and a uh you know gradient measurement function all these things I can spell out those things uh again trial and error and measurement is the only real way uh to figure out how to tweak these things to give the best performance now prediction I'm going to skip through this very quickly as well because predict uh we have a high level command uh for predicting numerical values but it works exactly the same way that classify does so I've got this uh beta set of uh about 20 properties of house prices in Boston this is a classic old training set um and here's the first 10 properties and I do exactly the same thing I split the training data into training and test I do exact same thing as before except this time the syntax is slightly different I'm saying I want to predict the house price uh the med I think was the um was the label for the medium value um was the thing that we're trying to predict so we're taking our training data but that's the value out of that data we want to predict given some other uh property here to tell it to try hard uh and now I can give it some unseen values and it predicts that this is a $199,000 flat as I said the data is very old so you'd be lucky to find um a $19,000 flat in Boston now the measurements again the principles is the same although the actual measurements are different for for continuous data so we have to look at things like standard deviation that our predictions were plus or minus about2 and half thousand doar we want our data to lie on this if we can so we can start seeing that uh that uh you know the biggest outliers are for the very high prices it would seem and there doesn't seem to be a great asymmetry so we can get sort of qualitative measures from that but again there's about 20 measures we can use if we really care about the details so in that world where you've got a number to predict or a class to predict and also while I didn't show it you want to just make clusters those things are really fully automatable unfortunately the real real sophisticated fun end of machine learning isn't fully automatable and uh for those the by far the winner has become the newal network um approaches and the neural networks um are a way of describing a model that has a lot of richness um but also um some of the um elements of the model are are architectures that are designed um through deep human thinking for specific tasks so this is the Lexicon of uh of the language of newal networks most of these things represent trainable layers that we can assemble to build a newal network um and some of them are very generic like um like uh um an element wise layer uh or a fully connected layer some of them are quite specialized like um um convolutional layers tend to be very good for image recognition I'm not going to go into how you decide to design these things I'm going to um skip over that completely and we'll use some that have been pre-built and show you how to use those and and uh modify them um so we saw this one earlier uh this is the uh Lunette model for image character recognition it's got a kind of compact image in our notebook but actually behind the scenes it's got 11 different layers and these things are um a convolutional layer is um is kind of image feature extraction uh some of these things are purely structural uh some things like linear layer are um about combining features softmax layer is a probability layer and so these things have been built by somebody to be a very simple image recognition network let's get some uh data to apply it to so these were the characters that we had before and we can simply apply that to one of the images to get a prediction this is here is a let's copy this out this is an image has come out of the data set and when I say apply that Network to that image we get back now um a prediction it's not a picture anymore is now an actual number so I could add something to it for example so in terms of using them uh we've tried to make that easy by giving them uh names that you can fetch using this netb command and if you go to our new net repository on the web you'll find that there's um uh all kinds of newal nets for different T purposes doing things like image segmentation or image content recognition or speech recognition all of these kinds of tasks as ready to download networks you simply have to say fetch the thing by name and then apply it to the data so from a use point of view we try to make that completely seamlessly easy um um let's not talk about um oh yes let's talk about this so if if we want to retrain for the same task um then that's really a one-step process in most cases so let's take uh some new data which is the blur of the old data and apply it now and it still seems to have done a recently good job with a little bit of blurring but you can see that now it's not doing quite so well that blurred four is looking more like a nine to it and the six is looking like a zero but if I want to train it with blurry data now in a similar way to the way that we had classify we have net train which is basically doing the job of uh of of classify and predict but it takes as its first argument the the network that we're trying to train and the second argument is the data that we are training it with so here I'm just doing a data preparation to label the data and it retrains that Network this is a lot slower so you can see without a GPU it's taking a good few seconds here and this is the error between the examples it's looking at and its prediction and as it tweaks it its weights and learns from it and makes the model better that error gradually Falls and at some point it'll reach some level that it decides to terminate or I can click the stop button and ended early oh it's found made some breakthrough Discovery and now it seems not to be doing any better probably because I don't have enough data for it to learn any better from and it will get to the end and now if you py it to that um it's now done a good job partly because I've cheated here and used the same data so I didn't do the test training split that I should have done um Now by default net train starts with the all of the training parameters of the net as downloaded so they've adjusted but they haven't been completely reset so it still does a good job on the original data um uh it hasn't kind of Forgotten everything but over time by this approach it will it will gradually forget um what it was trained on unless you mix your training data with the training data that it was originally trained on so it preserves both knowledge together now we can modify these networks as well so if you dealing with um a task that is similar to something for Van network but not the same then these things are all uh adjustable because there's a symbolic object that is computable just like any piece of data in in the wolf language so by we of an example let's get a a much bigger model out this is the one that's used by our image identifier command but here I'm getting the actual newor Network and you can see there's a bit more to this one it's got uh 20 y outer layers but actually some of these layers are graph layers within them so if I click on this one you'll see that the this one layer itself is in fact oops Got 23 layers within it so there's actually about two or three hundred layers in this network more than I'm going to want to figure out how to code by hand but that doesn't mean that I can't transform the things so one thing I could do for example is just take the first 10 layers of that so I can just throwaway bottom end of the network and now it's only the first 10 layers and now I can see what's going on inside at that point by taking applying that to the image uh I'm going to have a bunch of matrices to understand the matrices I'm going to turn them into images and what we're doing is now getting a glimpse inside the brain at level 10 of the of the network and we're seeing some of the features that it's detected so whatever this pit of the thing that it's looking for it found no features that matched um you can imagine some of these things might be picking out the stripes of the Tiger um I don't see any that look very like diagonal Stripes but uh you know maybe if we were looking closer to the beginning of the network at level three then some of the more raw features would be uh identifiable to things that we're really looking for in the real world what that also allows you to do is to do um Transformations on this so I can um replace um parts of this uh layer this model with different outputs let's just do this live here I think includ this so the basic idea is you do uh net replace part of our Network and we give a list of the layers we want to change so I'm going to change um um let's get the whole network back here so I can see what I'm doing I'm going to change this uh this last linear layer um to be size three so I can have it not predict from 4,315 things like tigers and cats and dogs but I wanted to do rock paper scissors for example I would replace the linear layer there with a fully connected three layer and I would have to do something else here with with the encoding decoding the encoding is okay because those are images but the decoder I want to give a new output classification language for this which is net decoder class and I'm going to tell it what those classes are and I'll just call them rock paper scissors red it's telling me I've got my syntax wrong because I got one too few curly brackets going on if I've done that right and another bracket at the end n doing something wrong here bracket missing what I should end up with is now a new ual net that ends in a vector 3 with my new classification of rock paper scissors you can see everything's black here apart from the linear layer so those have remembered everything they know about tigers and cats and dogs the linear is brand new so it has it needs training which is why it's red and then I would do a net train of that against pictures of my fist doing rock paper scissors symbols and it would learn how to uh recognize that so this kind of transformation allows you to take those things from the world of uh of our repository find things that are close to the task tweak them to a task without having to understand all the details of why there's a net graph layer in the middle um with a few basic bits of understanding about the kind of last output layers that softmax is the probability layer linear layer Aggregates all the features together to make a final decision from um and then make that change have we got time to discuss this I think we do yes so let's talk about um um yes we got time for this so let's spend 10 minutes on one key problem that you have to worry about when you're building machine learning which is uh concepts of overfitting or underfitting um because what we what we're not trying to do with our model is to build a model that remembers what it's seen you know there's faster ways to look up data and say is this example one we already know than to build a neural network we want it to generalize so that when it see something new it can take the things that it's learned from the features and the relationships between the features and make some useful prediction from that so underfitting you've failed to capture enough Essence from the problem and it just makes bad predictions um that uh you know if you imagine trying to fit a straight line uh um to very complex data like this in fact that's I guess what this first example is doing is if we only give a straight line fit to this curved data then it can't capture the the complexity of that data a straight line is just not going enough enr richness to it and that will underfit and make pretty poor predictions in places like here and here um it might happen to get lucky in some but on the whole it does a poor job um our generalization uh is going to look somewhat similar to the underlying model that generated our data but it's not going to take into account there's always going to be errors because there's noise and Randomness in the real world but if we give too much Freedom we end up with overfitting where we get a model that is so rich that it can capture all of the data points or nearly all of the data points but you can see in some places it's doing a much worse uh um model to fit to uh than than our our good model here so this is actually a terrible model for the opposite reasons than this is a terrible model because it's overfitting it's trying to remember the data rather than capture the essence and what it's therefore trying to do is effectively model the randomness that it can never capture because it's random it's well philosophically speaking it's driven by features that we don't have on the input Set uh and it can't capture those so it ends up being kind of just a random fluctuations in the model that mean nothing and those get exaggerated as you overfit so let's talk a little briefly about a few ways to approach that and I'm going to make a toy example here I'm going to make some uh data that's roughly an exponential curve but I'm throwing in some random noise into it and there's not very many data points and here is our data and I'm going to make my own newal network I'm going to code this one up from scratch um it's going to be a linear 50 layer a nonlinear um layer that's got a tan activ ation function so that's what gives it the the ability to be do something complex is mixing linear and nonlinear and another one and another one and then we're going to have one output the input Scala the output scaler so it's going to take a number give a number but it's got over a 100 degrees of freedom in here um that these linear layer 50 are turning this into a vector of 50 and this is going to be a 50 x50 Matrix there's all kinds of controls that it can adjust and we're going to train that and off it's going to go and hopefully if I've my random numbers have been useful here uh and also for consistency I've deliberately fixed a Training Method called RMS prop um because it's default one actually is a bit too good at preventing overfitting so I've deliberately picked a slightly weaker but efficient Training Method and now let's see what its prediction does now you could say this is a great model because uh when we take our underlying data it's all very close to the curve that our new net is has generated for predictions and maybe if this low feature here right in the center was a real thing the fact that the curve dips downwards here would be capturing that that knowledge but actually in this case we know that the underlying model here is uh well let's look at what the underlying model is here that before I added random noise to it was this exponential curve with X going from minus three to three that's the truth that we sample data from and through random noise in so this feature in the middle is entirely imagined because it's given too much weight to the noise the fact that somewhere here there's the data point was surprisingly low it's assumed that that's a feature and it's captured it so this is an example where it's overfitting we're not getting the underlying Essence that I happen to know is there it's come up with a model that's capturing all of that noise and while this might look good this time if I generate another set of data and compare you'll probably see that it does a very poor job so there's three basic methods I'm going to introduce here to to deal with this um uh one is uh called regularization there's there's different versions of it but I'm going to use something here called L2 regularization and I'm do exactly the same thing as I did before the same network as before the same data the same underlying method but I'm adding this regularization uh parameter and I'm training for the same the time and we'll see the result and while it's thinking I'll explain what regularization does um inside those that Network are these various matrices that are trying to find tweak their parameters uh in order to make the best prediction so here we've got a much kind of more generalized prediction that's trying to ignore some of those outlying bits of noise um now what regularization does is um ordinarily it decides whether it's a good or mad model by figuring out the difference between the training example points and the curve and it adds all of that up and that's in machine learning world it's called a loss function they can go get bit complicated when you're trying to do really rich things but in this case it's simply the sum of the errors now in our overfitting example it tried to minimize those by tweaking this line to try and go near every single point so the sum of the erors got small but what regularization does is it adds a bit of complexity to that loss function to say not only is going to P punish the errors is also going to punish the complexity of the model so the more the parameters in the model not zero uh the more it will say that's a bad model in the same way that it will say it's a bad model if it's not fitting to the data so it's taking both things into account so basically it's trying to fit to the data while at the same time use the simplest model that achieves the best fit because it's going to it's going to favor models that have very small or um zero parameters there's different kinds of regularization depending on whether we're trying to minimize the some of the squares of the parameters or just the non-zero parameters and and so on but basic basically it's going to punish complex models over simple ones uh at the same time as punishing errors and that compromise ends up giving us something that fits to the data but also is a is not unnecessarily using the all of the controls uh that were available to it now a completely different uh method is is to dilute the data and this is a slightly counterintuitive one we're going to make the fit better by making the training data worse so what we do is we do a little bit of a Transformation Network by inserting a new layer and we're inserting something called a Dropout layer and a Dropout layer um is a layer whose job is that whenever it's being used in training it randomly uh disappears so it's a way of killing certain parameters within the model randomly so you know imagine you're uh um trying to uh learn the best way to drive across the country um you might end up with a very convoluted route to get from one place to another that is technically shorter than everything else but what Dropout layer does is to say okay we're going to randomly close certain roads sometimes so if you sometimes drove a long way up a single Lane track then find that it's closed then that suddenly becomes a very terrible decision um in for that particular example because on that occasion it was an unreliable bit of the root so by making parts of the model unreliable it stops it over fixating so in if we go back to our analogy about plats and dogs if one of our features that we were learning on was the length of the ears well say the length of the tail and say long tail represents a cat you want it to sometimes be told um sorry I'm not telling you the length of the tail even though it was known in this training sample you have to do without that because otherwise you might find its entire model is long tail equals cat short tail equals dog forget everything else and if you make sure that sometimes it can't rely on tail length then would have to also take into account ears and fur and and face shape and other things so this drop out layer prevents model fixation you see the training is much noisier now because every time it's uh it's updating itself that random Dropout is uh is changing things so what was a good model on the previous step selling becomes a worse one or a better one randomly but on the whole o over time it ends up with a model that is now um richer than the previous one but you can see it's still trying to avoid overly fixating on things like outlying data points so we've still got some level of generalization um but achieve via completely different method the third one is an interesting extension on the test training set which is as well as having our test data and our training data that we need to decide whether the thing works we're g giving a third data set which is effectively a test data set for the trainer so we make some more validation data uh and this time we tell it to train the net on the same main data the same method the same parameters um uh but periodically it's going to take a Peak at this extra training set and say how am I doing against unseen data so obviously if it looks at this too much it ends up becoming part of the training data and it'll end up just being a bigger training set and it'll fall into the same overfitting but this idea that it validates it more periodically and doesn't use it for updating the parameters only uses it for evaluating the model um means that uh we end up with you can see here two things going on this is the best parameters being found but this is the performance on the validation set in blue so even though and I'll run that again because it's a bit slow to see even though the model can be fit better to the training data the fact that it's not improving the um the validation means that the best model it had was actually much earlier in the training uh process than the one it ended up with so it goes on trying to see if it can improve a model in a way that improves the validation but in the end it falls back to the simpler model ended up with earlier um that was the last one where the the kind of test the validation set performance and the training performance were at all similar um so that it knew it was generalizing rather than simply overfitting and again we end up with another model but one which is got some level of generalization now these aren't distinct processes we do all three at once um they're some of them are quite parameterizable so the regulation level we turn that down we make a a more sensitive model we turn it up it becomes a much smoother blunter model uh Dropout layers we can set to be minor or major and there's also parameters you can set with um obviously we can change the validation sets to be large or small so there's lots of things you can Sak with as well as controls over how the training performs but in the end a lot of that comes down to trial and error and then validating and you're trying to tweak the parameters of training to achieve the very best model that generalizes in the way that you need to in the real world so in the last uh um few minutes before we I look at the questions which I can see we've got some already I'll but now's a great time to be queuing them up for me let's talk about very briefly about the modern generative AI that has uh got hit the news in the last year um I'm only going to touch on this very briefly because in a week I'm doing a a webinar entirely on this topic so I encourage you to come back for that now things like chat GPT the generative AI models they're just machine learning models there's nothing fundamentally new going on and in fact the earlier version of GPT gpt2 before um GPT 3.5 which was when open iio went live is a model that we have in our network uh repository um it's got a particular architecture that's designed for string manipulation so it's actually quite small there's a a couple of graph layers and a chain here that's got more chains inside it that have probably things inside them so there's still some complexity there but it is just a neural network that has been trained in this case uh this particular version has been trained on web Text data which is a corpus that's been scraped from the web and what this particular model does is it's a it's a predict model that will take an input string and predict the next string that you'll get so if I give it the input string Al Einstein was a its best prediction is that that would be followed by the word teacher so it's just a class predictor for what is the the next best word and that's all that chat GPT fundamentally is GPT 4 that is the kind of current state-of-the-art model or is uh is just doing this um uh fundamentally at its core now if we can proct the next word of course we can add that word to the end of that and say Albert Einstein uh was teacher let's put it in there oops I didn't want the quotation box and we can predict the next word that follows that now uh oops I could actually type correctly in a live webinar Albert Einstein was teacher who so we can string that together recursively to now have it write code so here I'm just doing that I'm nesting I'm replying applying Cur recursively that uh in this case 10 times and we get Albert Einstein was respected German mathematician primally because he was in early and I could keep going so we're now generating Pros simply by making sequential word predictions and that's all that chat GPT is doing uh is a model that does this at a very large scale so where this thing has a few hundred uh uh maybe a 100 layers with a few hundred or thousand parameters the large language models have hundreds of thousands of parameters they're very large and trained on very large models so this ability to predict text becomes extraordinarily good so that it can predict the answer that would follow a question or the actions that would follow a set of instructions and they start appearing to be human-like intelligence because it can start predicting the um the conclusion from uh that would follow some reasoning statements and it starts looking like intelligence but in the end all it's doing is predicting the best words that follow now unfortunately these models are so big that are or at least the state-of-the-art ones are beyond the ability to run on a typical desktop computer um there's the the new generation of midsize language models that we're trying to curate to have uh in the whole kind of framework that allows them to be transformable and retrain um by our mechanism but we're not quite there on that yet but for the state-of-the-art ones that you're not you're not going to be able to run them on your home computer so what we've done with the wol language is to integrate them in so the language as external services so if I do this llm synthesize command here I can take a um a an instruction here and call in this case the open AI API and get that uh uh the answer back from that remote service so this wasn't executed locally on my computer like almost everything else was in this demonstration this was executed um uh in um in open AI GPT uh 4 I think it's the default no 3.5 I think I called here model to um to generate limeric and um that's not a very good liink I might give it another go but um you get the idea now it turns out that from a from a using these models um to their full power not simply generating um um sort of conversational chat or text writing uh there's a few paradigms that make them really quite usable um uh within um uh within the language and unfortunately see I've accidentally deleted this and I'm just going to recreate here uh what should be in the notebook uh and it's probably missing from the downloadable notebook as well um we have a notion of being able to make a thing that looks like a function in the war language but is powered by the Machinery of of um of of large language models and the idea is that you are basically programming in English rather than programming in code so what my function is going to do is to uh uh classify uh um something that we haven't seen into either um animal ve vegetable or mineral so this is what's called prompt engineering where I'm kind of giving instructions in English but this will be fed to the llm uh before the parameter and sets up the task for it to do uh I probably need to give it a bit more here by saying uh um you have to be very pedantic with e things return only one word and let's see what happens if I apply that function now to this list of data um and so I'll call that um function four times for each of the four data points and uh it's done cat animal vegetable cabbage vegetable Diamond mineral so it's uh done the right thing for all of those and I thrown in a curve ball here of something that isn't obviously any and uh it's decided animal uh is a better classification than vegetable mineral for a concept like love and the other basic Paradigm that we have is the same kind of thing but driven by examples which it turns out large language models are really quite good at uh generalizing form so here I'm giving it some prompt engineering again and me saying extract the name and age of a person from the text returnable from language association with the keys name and age looking like this and then here here's some examples here's an example John is 52 and um I want you to return this data structure a man called Mike is old I want you to return this data structure let given an example of what happens if the age isn't mentioned at all I've said missing and I want you to pause that as a Warf language expression when we get it back and hopefully if that works I can now give it this unseen text my manager Jane is very young she's only 30 and it's said name Jane ag30 so it's correctly pulled out the parameters and restructured that dat and this is really powerful for sort of dealing with unstructured uh and data and transforming data doing sort of classification tasks where you have limited amount of data so where I might have um for something like this used classify if I had thousands of examples um when I only have a concept and and enough energy to um to generate well in this case no examples but I might generate a dozen or two dozen examples quickly but I don't have a proper data set then large language models are quite good at using their innate knowledge to extrapolate on on the description or the examples to be able to do the right thing um and a couple of other paradigms within that world is the idea of providing tools so this is bringing the best of the computational world together here I'm saying use open AI to write a paragraph about Chicago New York but I don't trust it to know upto-date information about things like city populations so I'm telling it there's a tool available which is going to find the populations of cities some promp say what it does the parameter it expects and I'm going to run the W from language command so this is part of our inbuilt data I'll to show you what that does if I type something like London city and then type station we're looking that up from a a database within our system what we're doing is giving the llm access to that computational tool so that the LM is going to happily go off and try and do this but at some point hopefully in a few seconds we'll get our answer back well with remote Services is sometimes you're not in control of when they want to talk to you um it's having another conversation so this has ended up being a multi-part conversation between my computer and uh the llm and I'm hopeful this time it will stop having a conversation backwards and forwards normally this takes a couple of seconds so I think open a is a bit busy at the moment now what's it done here um so from here down to here was all chat GPT but that this point is realized it would be useful to get some facts instead of painted itself to the corner here the city of Chicago is a population of approximately and then it's called back to my tool and run um through or from language a look up from the database of the population of Chicago which is then written into the answer and then it's carried on its uh on its way and it's called back again so this is why there was a conversation where the llm was doing some work we were doing some work the llm was doing some work we were doing some work and then the llm wrote the conclusion and we get our paragraph containing The Best of Both Worlds um one other thing in this world is also in this generative world is image synthesis um but it's basically the same kind of idea we're using big Neal networks that have been trained on lots of uh things that combine text and in this case images and we are giving a prompt engineering to describe an image that we want to synthesize and it's got enough knowledge about the short and long-term relationships of images and things like dogs and where a cigar would go when you smoke it and uh and it's come up with uh with this example and these llm and uh generative things they have a certain amount of Randomness in them because that makes them seem more humanlike so if I call this again I'll get a different answer um I don't see any sheep in the background but maybe that's a sheep dog in the background staring at my brrown cigar Hound so that covers what I want to talk about we've gone from you know the basics of what these things are doing in simple cases the fully automated commands in the wol language like classify and predict and how you work through the the training process of train measure refine um and and test um and what the newal network framework allows in the same kind of paradigms but in a much richer lexicon of of of language we do have a longer version of the new network talk but I don't know when that's scheduled for next um but watch our our webinar stream to see when that comes up next to get more detail how to actually work with those new networks so in our schedule we still have uh all 15 or so minutes left so I'm just now going to turn to the uh the uh the chats here and um give me a moment here to find find uh uh yeah so uh text structure for f uh for further languages would be helpful and indeed is true uh there is a bridge that you can do now of course which is that um the text translation is is very good particular through things like the the llm tools so um there is probably something that could be done to map to an English sentence structure it and map back again I'd have to think about whether that would make an interesting resource function for our repository um we haven't had the training sets up to now for the other languages to be able to build the uh the sentence deconstructor for other languages um uh okay uh quite a complicated question here um so there's a problem with the um anomaly detection commands in slide six somebody says it possibly I have submitted the wrong version in the notebook um or it is also possible that it's calling on some data that um where's the Nom section example um you will need to have internet access um um to access the resource data um and this may also download a new network in order to work the first time so um if you're having trouble with that um I would suggest restart Mathematica run them again and ensure you've got good connectivity the first time once they're downloading cash locally they should run just fine um if you're still having a problem uh post into our text support because there's something wrong on on your installation um uh classifier can get much better with more data but where can we get data well that is uh that is a fine question um um there are if you look at things like uh kaggle and various sort of Open Source projects there are some quite big training sets that people have built um for doing training form but in the end one of the things we're trying to focus on is the tools to solve your problems it's all very well to be able to download pre-trained networks for certain tasks but where this gets interesting usually is doing something new um and if you're doing something new that only affects your world then you're likely the only source of the data so if you're trying to predict your customers behave there's there's limited value in fetching thirdparty data on customer Behavior Uh if your customers have a kind of different bias and slant and interests than than other ones um the calculations seem to be very uh extensive can the GPU uh be used if present does it happen automatically the answer is it doesn't happen automatically um but many of the commands um like I think this is going to fail because I don't think I've ever set this computer up but this does we only support Cuda and the way you do it is um let's go to the classify example I did before and find out if I'm up to date with my Cuda libraries on this computer um I'm not on my own machine so I don't know what state this is in where's the classify example here's the classify example uh the way that you would do that is you add the option and most of the machine learning commands have this um uh sorry not method uh that's the one we're using for it is Target device Target device that's defaults to CPU uh if you have Cuda Hardware you target device goes to GPU it will use that um uh and it so it's detected I have got Cuda Hardware on my computer but I haven't got the Cuda Library so again one time only it's now downloading those um onto my machine and installing them and maybe if we've got the patients that will that will finish and once it's when it runs depending on your Hardware it can be uh many sometimes thousands of times faster um uh uh we also support things like um batch submissions of things like Amazon and AO Hardware so if you don't have uh you know big stacks of uh High memory machines and big gpus then you can rent those things by the hour from Amazon and z and we can we've an automated mechanism for we to placed this classification job onto Amazon once trained the classifier can be run on CPU or GPU locally but um generally is much much faster than the training step um uh in this small example probably there's not much to be gained when the data sets get large we also have some mechanisms for training out of core I've done the lazy thing here of loading all of data into memory and then training on it when you have terabytes memory of examples you want to load them in batches and update the trainer and then load the next batch so you're never using too much memory at a time um I think it is now probably busy installing coola hard software having downloaded it so I might kill that in a moment and try and make sure I'm set up next time I use this machine uh would you mind to recommend algorithm for face recognition of twins um there there are some specific neuron Nets out there for face recognition um whether any of them I don't know what the sty art is for twin recognition where um I suspect uh twin recognition is extremely difficult um and you know these nets aren't necessarily magic in their abilities um we do have one built in so we have a a high level command for um face recognition um I forget what it's called um um uh it's something like face recognize or there we go face recognize so we uh that is simply backed by one of these um custom design newal networks and what face recogniz does is just package up the training process to make it simpler it does things like uh facial um identifying where the faces are in images uh automatically and things like that as part of the the the workflow but you can see the mechanism here is you train it with tra examples and then you apply to unseen and it makes the prediction um is there support for Hybrid models using partly models and partly data um we've got nothing built in for that it is something I've I've given a little bit of thought to um that about trying to make a uh utility function for this because one approach you can take um is is if you've got let's say you've got 10 input data points and you've got a model uh that gives you some idea of that of you know the physics of the situ situation what you can do is build into the um the encoding process something that encodes those 10 numbers instead of to a vector of 10 numbers to a vector of 11 numbers where the 11th number is the output of the of the physics model you've coded up now if that physics model is really good then when you do the training on those data points it will start to learn that that last data point is the most valuable data point of all if it's uh is a model that isn't very good or it breaks down a certain situation at some point it'll start learning that sometimes the better prediction comes from ignoring the model and it'll start to learn the situations where the model is good and the situation when the model isn't so I think that could really be packaged into a kind of neat single step um uh train with uh with model uh function for the function repository but that's only the approach I would take and I don't think it's a it's a complicated function to right um right uh so so uh you discuss a function of a single variable is it possible to study a function of two variables um I mean I the ones so I want to distinguish between one variable output which is uh where commands like classify and predict can work and when you need multivaried outputs then you generally need to start moving towards you building new networks so you can encode the dimensionality of what comes out so if you want to generate a picture for example as output classify and predict can't do that for you um unless the class is a pred decided picture when talking about inputs I only used one dimensional input for my um my over fysic example because it's easy to visualize and it helps it make it easy to explain so all of those examples do with things like um the inputs dimensionality and um and measurements and and controlling things like overfitting and underfitting all of that stuff will work in arbitary Dimensions so um in exactly the same way uh how just determine the number of layers for a large language model um uh the very large language models uh or at least uh the main ones are not open source so you can't actually see for sure what's going on um and so you can only rely on whatever people like open tell you uh they have there is as I said there is a uh uh a whole collection of midsize models now which are open source and uh where you can actually get your hands on the model and those are studi we haven't got them into our newal Network frame yet um if you want to Simply count layers in a network then uh um so we could take that image identify thing and we just convert that into a uh list of layers maybe this was a big one to use oh we're still waiting for that Cuda installation to happen let me kill that um I don't know the keystroke on Windows for killing evaluation I'm just going to kill it the hard way in fact we can just use length if we want to not the top level layers so I think length of net model uh of uh what was I using I was using the wall from image identifier Network will give me the answer there we go so at the outer level um the the thing has 24 layers although we need to reach inside and uh and um write a slightly richer pattern to be able to collapse all of these nested levels but basically all of those things are interrog if they're in the net model representation if they're external then uh you have to go look at the source code or read the documentation for those models um yes I will post so I was asking press link with the additions to the notebook um I will um fix the missing example and save this and um we'll share that in followup um to anyone who's registered with their email address um I would like to impl some of these on a supermarket sales data set can you offer some guidance um yes um so here's just a you know few things to um as a as a first step um first of [Music] all um you probably want to start with using even though machine learning kind of offers the promise of not having to use your brain too much start by by thinking about where you're likely to get the strongest signals so you could throw all the data you want into it um and then try and get it to produce something but you're better off thinking about um you know if the socioeconomic class is known to me that's got to be a strong indicator if the age is known to me that's got to be a strong indicator the color of their car probably isn't and you know pick out the things that you think are going to be strong features that you have good clean data on and that have uh things that you can train against and start by trying to do the simple cases like that um uh obviously do what I said about holding back test set um if you can get the simplest cases to do something useful then you're on to something where you can start feeding it more data um and uh giving it more kind of uh challenging things to predict against but if you can't get the simple things to work then um then you're you're going to be wasting a lot of effort not knowing whether you've overloaded it with a relevant data whether it's quality of your data so I would start simple and and build up um I mean of course uh I should mention here I'm supposed to be um helping the the company here not just talking about things I'm interested in we do have our Technical Services teams who can help you so if you have a project like that and you want some assistance then we have a team that can come in and and do the work for you that U as a as a as an external project um Can it predict the functional form of the underfitted good fitted or overfitted model um so we um so the model itself uh you know has a functional form in the end but it's it's going to be massively complicated combination of uh all of the multiplications of the nonlinear elements in your network uh but one thing that you could do is to uh take your model resample from the model so just uh just you know do it like a equally space sampling or something marching through the expected range of the model and then we have um we have a um a command that will take data and try and predict uh um L function maybe um this is not one I use very often but um it begins I'm pretty sure it begins fine so let's um all find commands here oh that's not very helpful is it find um there is a command on this list uh no not find equational proof it it there is a find model uh type command and I can't see it and I can't remember what it is um that will predict the um uh the equation that and it's reasonably sophisticated it isn't just kind of like that's a straight line or a quadratic it'll say oh it's the sum of s of 6X plus a linear function um and you would apply that to the data that you've generated that's probably the best guess uh for doing it um if it's a probability distribution we have fine distribution but so that's but for an equation it's a different command maybe it's linked from the fine documentation page um uh fine formula that could be the one I want yeah here we go so the idea is that fine formula takes a set of data and a variable and generates the the formula that comes out of that and you can see it can do reasonbly sophisticated things F Formula of that data and it's come up with this formula so that's where I go about it generates data and then use fine formula on it I mean in the end fine formula is is a custom neural network backed um and htic backed bit of uh prediction but we've got that already packaged up across this set of standard function um uh do you think it would be possible to train a model for selecting audio recording takes based on ination sound quality and rhythmic Precision um uh absolutely um feasible um there are already um there's you know some aspects of voice are there are already pre-built networks we have ones built in for um recognizing um the the word spoken of course but also the speaker so if you have two people in conversation partitioning it into uh into which speaker we have those built in already um you would probably again be best to try and find some labeling data where this is somebody angry or this is somebody um Happy and and I don't know if those sets exist well enough but if they do then the same kind of techniques would apply to be able to say something quality to about theing with r rhythmic Precision it's worth pointing out that my line about use machine learning when you're data rich and understanding poor does have a Converse that if you're understanding rich and data poor then uh there are other techniques so if you if you are trying to detect can somebody speak on a very rhythmic pattern you might be better to look at that using things like fuer analysis or um uh or sort of wavelet transforms in order to try and get the underlying rhythmic out of the sound regardless of the voice that might actually be more accurate and similarly for things like pitch detection um there are ways that are both new network based machine learning based but also um signal processing numerically based and sometimes hybrid Solutions are the way to go so don't don't always leap to uh machine learning as as the first step um yeah find fits somebody was suggesting when I was stuck there trying to find this fine formula fine fit takes a given formula and then tries to U find the parameters that optimize it so that's basically um regression it's it was fine formula I was after uh um where to find info on getting any Onyx model convert it to W language and select the GPU as a device for training inference um so there are two things to say about Onyx which is the open newal network exchange format um and wol language and so um if you are wanting to use our framework we have uh Onyx Imports support and Export so we don't have 100% so it depends on exactly the details of the layers and of the model used but for many models you can just use import on the Onyx file and you get the new net model in our net language where that is not possible we also have something called um external net model I think it's called an external net model is a way of I've misspelled that [Music] external just external net model not net external object okay that's probably what I wanted so there um uh so well I'm not seeing a good example here because that's importing an anonyx model but the idea of the um the external model is that where you can't import it you can reference it and then you can do certain operations from within our framework using that external model so you can retrain it you can execute it but you can't do net surgery on it if you can't get the thing into the symbolic representation you can't do these kind of net replace part type operations but if all you want to do is retrain and use the model then net external object allows you to point to an onyx model that's non importable and and and treat it as if it's a built-in command um okay um what's next uh can I utilize the wol technology for an object identification project on a robot um so in terms of object recognition and identification yes that's exactly what all of this machine learning framework can do there's a little sub the on a robot because um if your robot has um embedded Hardware then we don't at the moment have the ability to do code generation we can do embedded Hardware at the level of Raspberry Pi so if you've got something like a Raspberry Pi on a on a Internet of Things type object we can run War language on that but if you've got other custom Hardware not a kind of topper full computer then you won't be able to run this on the robot itself so you either need to generate the net model by developing it within framework and then exporting it out to something like Onyx or uh mxnet and then using one of these executor libraries to run the thing on some non you non computer hardware on the robot or you want to have the robot call an external service and you can anything in the warolf language you can mount as an API as an API on a Cloud Server um so that the robot would send the image to the server say you know where in the images the box and get the number back so there's the there so that there's several approaches to where you execute it in certainly in terms of solving the the um the intellectual problem absolutely the only question is for a robot where where is that model going to run exactly and what Hardware is it running on there is a subtlety there um uh okay so if you have SCM image of nanop particles on the surface of a support material how can you instruct image classification to give the the population of particles with different sizes in the sem image um again one thing I would say about this is that there are non neon Network models for doing all kinds of image partitioning so you might be better to do uh use one of the image filters to filter it by color or Texture and then um and then morphological components to segment the image into blobs and then measure the blobs and find their sizes another approach which I didn't show today is um and I've got a terrible memory today for the names of commands there is um a there is a Content extractor command let's see if I can look it up uh which is trainable for so the image identify command I used was for the whole image but um content detector function is a version of that kind of technology for subimage recognition so here's uh here's an example where we are giving it examples of things you might see in an image uh oh that's not the example here's the example there's the image and we've got a rectangle that says uh within that rectangle there is an apple and within that rectangle there is an apple and in this image within that rectangle there is a strawbery so we're showing it not just what's an image we're also showing it where in the image and that is going to be um capable of dealing with different sizes as well so we might have small features and large features we would want to give it examples of both um and then when you call the thing in in Anger it you give it the test image and it can now identify um um the thing it recognizes but also the coordinates of where it finds that um uh that um feature so that's the way you would go about that possibly um maybe as a pre-step towards then um using kind of classic image segmentation techniques you identify where it is and then segment just that box to try and get the perimeter more accurately um you also might want to look at some of the uh image segmentation newal networks um that are available in our Network and I'm going to give you a little Glimpse here of a um I remember the name of it uh a command that doesn't exist in the warolf language today but we are shipping version 14 very shortly and there is a um um there is a new newon net based image segmentation filter um and I'm just going to get there quickly by um by going to image segmentation components image segmentation filter so image segmentation components is doing classic image processing type operations like segmentation uh but using uh a segmentation ual Network to achieve that in quite robust conditions and the filter is a way of doing things like uh for a for a given pixel that we're interested in what is the component that is around that so can we pick out for example the carrot or obene or whatever it is in this image and it's fairly robustly found the extent of the segmentation of the image so that might also be a useful First Step that is crossing between the two worlds uh the machine learning features that I've everything I've shown is available on the Raspberry Pi as well but bear in mind that razie is probably a terrible place to do the training because it's slow and has fairly limited memory so in practice you would probably want to do the training side of what we've done on a desktop computer or in the cloud and then copy the trained model onto the Raspberry Pi to execute you could do it on the Raspberry Pi there's nothing to say that it it can't do it it's got the capabilities but it's it's it's you know pretty underpowered hardware for doing serious work um okay so uh Malia your problem sounds like something to do with something wrong with your installation of Mathematica so um drop a um an email either to me or to text for and we'll get somebody try and figure out why that stuff isn't running on your computer I don't think it's a general issue um and unless I am missing any from the YouTube okay they copied over so I think we're now running out to questions and time simultaneously we're 10 minutes past the time due so this seems like a good point to stop so let me just wrap by uh thank you for all who've stayed to the end for listening uh I'm back next week talking about llms in more depth um this is the link to the webinar if you want to go and register now or you can if you got your phone on you there's a QR codes that you can um uh use to get to that same link uh do come and uh uh join that or any of the other webinars in our series and are we doing a uh CH post questionnaire so it' be really useful and when we close the webinar in a few seconds if you hang on just for an extra 10 seconds there'll be I think a three question question that helps us to understand whether we're giving you the right kind of content it's really useful to us to please stay behind for a few extra seconds and click three buttons to say what you thought of today thank you very much