Real-World Python Machine Learning Tutorial w/ Scikit Learn (sklearn basics, NLP, classifiers, etc)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
  • Original Title: Real-World Python Machine Learning Tutorial w/ Sci-kit Learn (sklearn basics, NLP, classifiers, etc)
  • Author: Keith Galli
  • Description: In this video we walk through a real world python machine learning project using the sci-kit learn library. In it we work our way to building a model that ...
  • Youtube URL: https://www.youtube.com/watch?v=M9Itm95JzL0
👍︎︎ 1 👤︎︎ u/aivideos 📅︎︎ Oct 03 2019 🗫︎ replies
Captions
hey what's up guys and girls and welcome back to another video very excited for this one today we're gonna be going through the scikit-learn library of Python which is a very important library for machine learning in Python and definitely this video was pretty highly requested so I'm excited to finally get to it so what will specifically be doing in this video is Ruby kind of slowly walking our way through the cycle learning library and like kind of showing all the different avenues you can go but our ultimate task that we'll be building in this video is going to be two different machine learning models the first model is going to automatically classify text that we put in as positive or negative so for some examples if I ran this line of code right here you know I thoroughly enjoyed this five stars that's positive bad book do not buy that be a negative example very interesting stuff thank you that's positive and as you see the machine learning model outputs corresponding values I could change this to something like horrible do not buy her the I already said do not buy horrible waste of time and that will now we'll see this switch to negative to cool so it like automatically knows you know if these are positive or negative and this could be applied to all sorts of cool things like I could you know incorporate this model into my YouTube comments and see how much positive how much negative stuff I'm getting so it's really fun kind of playing with actual text data and creating a scikit-learn machine learning model and I guess getting this as the output then we'll do one other task which is very similar the model doesn't change too much but it's just kind of cool to see a little bit more what you can do with the same type of SK learning stuff so this is also a NLP natural language processing model and this one instead of positive or negative we have these several different categories that were kind of grouping our comments in and kind of the way you can think about this one is imagine you're like in charge of Twitter like HR and you know you're getting all these positive and negative feedback to your Twitter account but you don't know what products they're talking about this machine learning model medically classifies text as a certain category so like great for my wedding would map to a clothing category I loved it in my garden would be patio category and good computer would be electronics and as you see that Maps correctly to these things so those are the two models will be building will be walking through all sorts of cool stuff with scikit-learn quickly let me just cover the timeline on this video and this will be in the comments so make sure to check that out we're gonna start out with kind of just a brief overview of why you use SK Larian what's its purpose when you should use it when you shouldn't use it then we'll jump into loading in data into SK learn and what SK learn can do to help us with that and we'll be choosing our classifier once we choose our classifier will be you know evaluating the performance of the different classifiers we'll be doing some fine-tuning of the classifier to make it even better try to think if there's anything else we will also save our model so you don't have to retrain it every time if you wanted to use it in production and also I just want to say real quick this is a fairly long video so if you want to break this up into like 15 minutes I'm gonna try to structure it so it's pretty easy to do that so don't feel like you have to watch this whole video at one time watch one chunk come back to the second part later but I kinda want to keep it all together because I think it's good to kind of see the full from start from data to final model Avenue ok so a little background information on SK learn and machine learning more generally is that I would say that pretty much every machine learning task has several steps associated with it so I mean at its core any sort of machine learning thing you want to do you're gonna have some sort of question that you want to answer so that's always like the first thing you know you have this question that you want to answer and then you need to find some data that will help you answer that question you can build model around that data once you have that data found and you apply have to do some prep on the data some sort of processing some sort of filtering once you have your data all prepped you're going to want to build some sort of model around the data once you have a model you're going to next move on to doing some evaluation how well is your model performing and kind of after that you'll make improvements you'll tune the model they find different parameters you'll you know maybe tune your date a little bit more and just try to improve your model as much as possible SK learn helps simplify that entire pipeline they're their help to help a lot of common things you'll want to do to improve your model and to get your model ready to be built or I'm actually using a specific algorithm for the model SK learned packages is that packages all that up into a nice library for you so a little bit more detail like I'm here looking at the SK learn site there's all sorts of classification algorithms that are out there and you know here is some code for a certain classification algorithm that long and convoluted but with SK learning you can use that same algorithm in you know a couple lines same thing with your regression clustering everything that might take a lot of your own code to do you know someone's already SK learning has already packaged that algorithm up for you and it's easy to just use yourself another separation I want to make real quick is that I would say there's kind of two types of models that you can build in machine learning so this is very general you know on the one hand you have the neural network the deep learning models in and on the other hand you have these traditional algorithmic type machine learning models SK learn is really I think helpful in this traditional algorithmic models they you know have packaged up all of this stuff for you to use from the algorithm side for neural net type stuff we're not going to be covering it in this video and I don't you know I saw I looked at some stuff on the documentation they have a little bit of neural net stuff it looks like an SK learn but I would really recommend if you want to get neural network machine learning experience check out either tensorflow I use pi torch personally a lot there's other libraries that really drilled around neural networks so we're focusing on that traditional algorithmic approach side here okay so now that we've gone through a little bit about ska iron let's go back to the question where originally trying to answer and that is automatically classifying comments as positive or negative and to do this we first need some data to train our model the easiest approach like the initial stupidest approach you could do is imagine me just going through mainly writing like stuff like and that would be negative or stuff like he is he is the best that would be positive I could do this but I'd be here for years basically creating enough data for the needs of this video and to make a good model so yeah mainly creating data is probably not the best approach you could crowdsource and like get a lot of people to manually cut you data that could work potentially a little better but still time-consuming probably costly not the best so the best approach to getting data you need is try to be creative how can we find data that will help us do what we need that already exists in for this positive negative feedback model the place that I'm decided to go was amazon.com so I mean we click on anything like this cute little oh my gosh this is so adorable this cute little costume and we can go to the reviews and the reviews basically here are reviews of any product basically have already labeled data for us if something is five stars this material in here is gonna be very positive on the other hand let's see if we can find a one-star I don't know if I'll find it with how adorable this costume is okay yeah we have a one star review right here maybe make this a little bigger like this is not quality at all like in this one star review we have a negative feedback we have negative things that people are saying so if we can take a lot of Amazon views we can use that as our training data so that's exactly what we're gonna do so what I ended up doing and I made it a little bit simpler for you is that this guy JM Colley at USC D did a lot of work for us where you don't actually scrape through Amazon but from 1996 to July 2014 he collected all sorts of Amazon review data so I went ahead and took some of that and broke it down I have a script that shows exactly how I did it that'll show you but I just kind of broke that down into a more manageable amount of reviews for several of these categories here from the year 2014 so that's what we'll be using as our data as our training data okay to get that data and begin actually writing our code I need you to go to my github page Keith galley this will also be linked to in the description and yeah Keith's got a slash just came hyoeun and while you're at it follow my github because the more milestones I hit more fun pictures like this I got a post okay and so once you're on the github page you're going to want to go to data and to begin I just go I recommend just going to sentiment and downloading this file I called books small so this is 1,000 Amazon reviews from 2014 specifically on ebooks so go ahead you can click on that and if you click on that there's somewhere - yeah just download the raw file here or you click raw and then you can do save as so books small and text document should be fine or maybe just even do all files then just do JSON don't mind we're I'm saving it yeah so I'm just saving that I already have it saved but yeah save it somewhere that's close to where you're writing your code okay open up your favorite code editor I personally like Jupiter notebooks for anything diet data science related and that's actually started diving into the machine learning tasks so to begin before I write even any code that's just look at our data again and I guess for the first time this is what our data looks like this book's small that I told you to download it looks like this so it's a JSON file I guess and each line is a another JSON object so we're gonna have to if we want to like load this in in a logical way we're gonna have to load in this file line-by-line and then the water are important fields here in this JSON so I see we have this review text and that's actually what is the the text content in the review so that's important so we're going to want that and then the other thing that is going to be important oh my gosh this is a long one is going to be this overall that's the out of 5 star rating so those are the two fields that I really care about and then we can kind of do some additional stuff with that so that's a file and let's start processing that so one thing I recommend is let's first import the JSON library that will just allow us to process that file and then if we're going to want to load this in line by line this is what we can do so we're gonna go well first off we need to know our file name and our file is called the path that I leave saved mine in and this is relative to the code that I'm writing it's within the data folder and then it was within the sentiment thing and it was called books small touch JSON so that was my file name you might have something different alright now we want to open up that file so we can type in with open file name what that guy is f so we're opening this file name as apps the file is called F and then what we can do is for line and F and just to see what we have right now I'm going to just do a print line and this is Python 3 just a heads up so I have those surrounding it with the parentheses and I'm gonna break out a little because I don't wanna run everything right now okay cool so I did get something out of that and now what I want to do is I want to quickly I want to quickly what do I want to quickly do I want to get the review text so we want to get in the overall we want to get the review text here and like this is definitely a positive thing it's four star overall like divine by storm with his unique new novel she develops a world like unlike any others so this all sounds good and this is about books so let's try to get that review text so printed line normally if this is like loaded in a dictionary we should just be able to do line and then review texts like this so let's see if that works ah we're getting there and so the reason we get an error is because right now this is just raw text so we need to use this JSON library to actually load it in as a review like dictionary basically so we could say something like review equals json dot loads line and now if I print out review of review text we should get what we're looking for here know what type in us cool now we just get that text and if I just wanted that overall score I could do review overall that was the JSON key that will produce the right value that what did I just do for point out cool so that was the four stars so we got both the things we wanted so now what we really need to do is just gather this all in a nice way so I'm gonna get rid of this breach statement and I'm going to say reviews that's going to start out as an empty list and what we're going to do is we're going to just do reviews dot append and maybe we'll append a tuple object of review review text and a score and the square I can go ahead and delete this because I don't want all of it to print for all of the thousand lines and it just check to make sure it works let's print out like a random object in that so I'm going to just go with like review number five love the book great storyline keeps you entertained for a first novel from this offer she did a great job would definitely recommend so this is a very positive review I'm surprised that they gave it just four stars because they could have gave it at five but yeah whatever but yeah as it looks like it loaded in properly and now if we just wanted access like the review we were just doozy row and we'd we wanted the text or the score we do one well this does work with this whole indexing to get like the score and the text it's not the neatest way and I think one issue I see with a lot of like data scientists and machine learning engineers is that it gets messy their data gets messy it gets hard to parse through someone's code so what we're gonna do real quick is make a data class for are all of the data that we're loading in so I'll do this above so we're to call this class review and we're going to initialize it with you always have the initialize it with self self texts and score and basically self dot text will equal text and self dot score the score is equal to start like the number of stars what I'm saying with that as you go to score and then we do some additional stuff within this class to like convert this score to a sentiment and we'll just ultimately having this class will make things neater so now what we're going to do instead of appending this tuple we're going to go ahead and create a review object so review and then we're gonna pass in the text and the score so we already actually have that here text and score so now instead of doing one index to get the the score I can do reviews five dot score and as you see it stays the same and if I did the text text now I can easily get the text just a little bit neater and helps you kind of keep track of things and we're going to do a little bit more within this class so one thing I want to do is initialize some sort of sentiment so a new self-taught sentiment and for sentiment four or five stars means positive one or two stars means negative and then I guess we can use three as like neutral so I'll create a function within the review class called get sentiment all functions within a class pass in itself and so what we'll do is and I'll set this to self get sentiment so once once we fill out this answer it will return whatever this function does and okay so if it's three or four or five stars it should be positive if it's one or two stars it should be negative so I'll start with the negative if self dot score is less than or equal to two and we want to set it or we want to return negative and I'll just use strings for now L if self dot score equals equals three this is going to be a neutral case I don't know if we'll actually use this at all but I just want all of our possible scores to be covered so that's neutral and finally else this is going to be score of four or five that is going to be returned positive and another small thing that I like to do whenever possible is I don't like having just strings floating around I like to be very consistent with those strings so I don't actually have accidentally like take the wrong thing so I'm gonna actually create an enum class which is just a regular class but you kind of call them enums and programming speak and I'm going to call this sentiment and that's gonna have a couple different properties it's going to have a negative equals negative and you'll see why I'm doing this in one sec neutral equals neutral and positive equals positive so the reason I did this is the reason I did this is because now let's I'm gonna run this again and I'm gonna run this again I can now do something like reviews dot sentiment so remember this is a four star review so it should be positive and it says positive right that's fine we said that here but now basically instead of always typing out the strings negative and neutral and maybe forgetting that we capitalized the whole thing or spelling it wrong we refer to these things as the sentiment class and we do sentiment negative or sentiment dot pause the urges are this is neutral mutual and sentiment dot positive it's just really to ensure that we're being consistent it's also kind of nice because it's a lot of our eTCO deters can auto recommend this to us but now if we go run this and go back down here you'll still it's still says positive it's just a little bit neater having this type of thing you don't have to do that though but just something I like to do okay so we now have this review class that automatically fills out the positive or negative sentiment so we're getting there all right next let's go ahead and do some further prep of our data and basically what we're gonna be doing next is let's just take this text again basically the issue is when we're dealing with text data it's really hard to build you know machine learning models around text data machine learning models love matrices and you know numerical data as input numerical vectors so we're gonna talk about some ways to convert text into into a quantitative vector and we're using bag-of-words to start real quick to understand how bag-of-words works if you don't know already imagine we have these two phrases this book is great and this book was so bad so the way we do bag-of-words is we first break this up into like a dictionary of tokens or words in this case we'll do unigram so just single words as our dictionary so we'll break that up so we have this book is great sometimes you might optionally include this explanation exclamation mark but it kind of depends on how your vectorizing this then we also have from the second talk was so bad so if we were training a bag of words model we'd use all of these words to create this sort of dictionary and then to actually convert this word into a numerical vector all we have to do is map ones and zeroes to the terms over here so this book is great would have ones map to the first four words but it doesn't have was doesn't have so doesn't have bad in it so those would be zeros ones were the word is zeros were not this book was so bad that would look something like this you'd have this book they both exist is is not in this sentence great is not in this sentence was so bad those are all in the sentence so this would have been a like fit transform process four-bagger words the count vectorizer on all of this one last thing is if you wanted to now transform a word or a sentence you hadn't seen before so imagine we have was a very great book one small detail about this is a and very weren't in the original training set so we actually end up like dropping those terms out we we don't know what to do with them because we didn't see them when we were fitting our vectorizer but with bag of words for this if we saw this in like testing time that I'd be converted into 0 for this one for book greet is a one was is a one and everything else is a zero that isn't existing and then we can't handle a and very because we didn't see him at training time so that's in a high level how bag of words works one last thing before we start actually using better words and pretty much any machine learning task you want to do you have some set of data and so right now we have all of these reviews and we have a thousand reviews total right now if I'm not mistaken I can double check that yes oh we have a thousand reviews right now but whenever we're building machine learning models you know we want some subset of that to be training data in some said subset test and basically with the common pattern rather SK learn is they have nice methods to do pretty much most things you would think people would want to do frequently with machine learning so in this case we want to split that a thousand into a training set and a test set so what I would do when I'm like doing this on my own is I just would literally look up SK learn train name test split or something like that and you know there's a couple different options that pop up right here but usually your first result if you google something pretty straightforward will be what you're looking for so we have this train test to split it has all sorts of information about it one thing that's really actually kind of cool this is someone commented to this the other day is okay so how do we import this let's see some usage okay you can do this okay so we're trying to split our reviews a thousand reviews into a training set and the test set so I was just showing you the documentation but what's actually pretty cool is if you are using a Jupiter notebook like I am if you do Shift + tab on that function so me it gives me the host same exact documents basically right here my Jupiter notebook which is pretty neat so split and raise or matrices into random train and test subsets so that sounds exactly what I want I can go down even farther and look more about it one thing I'm noting is test sighs if float should be between zero and one point O and represents proportion of data set to include in the test split so that's the first thing that I am noticing is important is test size also that we can just pass in any sort of arrays and it will take care of those let's see you can also specify train size random state is another important one this basically allows you to seed your random split so if you wanted to repeat the same exact split in multiple instances if you just set this to some value any time you set it to that same value you'll get the same exact split stratify also could be important basically it would keep the proportion of class labels so in our case sentiment dot negative sentiment a positive the same in both splits or relatively equal so it wouldn't just like take all the positives in one set and all the negatives and the other by accident okay so let's pass them an N so we want to pass in our reviews we want to give it a test size so let's say our test size is going to be 0.33 33% of the reviews will be what we can use in test that means that 66% is training and also give it a random state so that we get the same thing every time and what is this return that's the last thing I want to look at so I'm going to just highlight this again shift-tab where's the return returns lists and okay so it returns an X&Y looks like okay no matter how it sorry I was just reading that and I got a little bit confused so how many ever no matter how many lists you passed it and how many arrays you passed in its gonna output two times that so that our case we're just passing a single reviews list so we'll get an x and a y back versi that's not actually that I guess this would be more appropriately training and test let's run that so if we look at the length of training we we took sixty-six percent of things so this should be 666 about I don't know how it's gonna round but let's see 670 okay very close yeah I guess 33 percent exactly we didn't specify a third exactly and then test should be 330 so you see how we nicely split that up into a training and test so now what we're gonna do is fit our bag of words model to the training set we'll build a classifier on that training set and then we'll do every it will test everything on our test data all right so we want to pass our training set into our bag of words vectorizer let's just look at that's like print the first row of our training our first training review these are all the reviews okay and this is still the object so if I wanted to print the first text remember we can do this cool cool okay so this is a positive review let's just check so remember we can do sentiment just remind yourself the data okay cool so what we're going to want to pass into the bag of words vectorizer and or maybe more broadly thinking are we want to have take text and be able to predict whether or not it is positive or negative so our X the thing that we're passing into our model is going to be the text and our Y is the category or the sentiment so that's positive or negative so it's probably worthwhile splitting our training data into like training X or maybe able to call this train X and train Y so to get the just the text for X we can do a little list comprehension so we could do X dot text for X in training and for train Y we can pretty easily do X dot sentiment for X and training so now going back to the what we were just doing with the text if I did train X 0 now I don't have to do dot txt and you see that that text we already saw is there similarly train Y should tell us that positive sentiment as you see it's positive so now we've split it up to actually be our X the X is what we pass in and then the Y is what we're trying to predict so for training we use x and y together to like know how to build our model and then when we're testing we test on just test X which is going to be X dot text 4xn tests and tests why would be same thing as train Y X dot sentiment for X and test so yeah so when we're trying to test our model does we pass it just test X and see if it matches up closely with test y but that'll be in a second okay so sorry I got sidetracked but bag-of-words vectorization on this so once again this is the common theme you know we have this bag of urns method that we want to do but how do we actually do it with SK learn well let's just do another quick Google search SK learn bag of words it's probably a safe bet what happens so we get a bunch of these responses and once again first to probably either oh these ones are going to be good options I'm gonna just click on working with text data here it looks like there's a little bit of mumble jumble but let's look up the bag of words in this bag of words Oh bags of words okay bags are words okay in order to perform machine learning on text documents we need to first turn text content into numerical feature vectors so that's what we've been talking about in this tutorial so far most intuitive way to do this so is backwards representation so they have a little description in here and then as we see right here it looks like they're actually doing it with SK learn so this looks super easy we can kind of just copy these lines so I'll copy this line right here and then count back okay cool and it may be I actually what's useful is to actually look up that count vectorizer because that's really what we're gonna want to do that the use because it just showed it a nice example and if we go down here what often is even most helpful and I'm like trying to get familiar with a certain SK larren library or tool or method is to find the example they always provide examples in the documentation and so if I look at this example it has these four documents in its corpus and as you can see it extracts out these features here and if you remember bag of words that we just went over this is the last feature so for the first document the word this appears so there should be a one in the last spot for the first document it is does the word third appear in the first document no it does not so zero just cool this is what we want and the one thing to note that is slightly different is it's not a binary thing with this compactor Iser by default like for the second document has two occurrences of document and that's represented with the two here and I think there should be a way to make this binary so you could just do one or zero I'd have to double check that you actually binary right here so non zero counts would be just set to one as opposed to two so you can play around and see what works better for your model but let's go ahead and actually do this now with our thing so I copy that in one thing to note is if you want to use that shift-tab method method to see the docs you have to run your cell first it won't know what it's looking for unless you do that I tripped myself up a little while ago but not doing that okay so what's in that example vectorizer equals counter vectorizer we'll just copy that into our cell as well yeah it was popping it from here okay character he's a contractor and then it does fit transform corpus okay and vectorizer dot fit transform and now our corpus instead of being the little review corpus they showed us here it's actually just all of our training reviews so be trained acts and so what do we get when we run that ah a really big matrix so it's a six hundred and seventy so that's every one of our training values so all of those have their own row and each one of those rows has seven thousand three hundred and seventy two columns so this is a really really massive matrix it's a lot bigger than this little example matrix but then again if we think about this it makes sense why it's bigger because we have now six hundred and seventy documents that are all longer than these pieces of text so like our matrix is pretty dang big but to our computer it's not that big of a deal so we're gonna be totally fine with this vectorizer okay now that we have this vectorizer we can go ahead and start getting ready to actually build a model around it so basically what this outputted and what we see right here is what we're actually gonna want to use as our training input so like we had before we had train X which equal let's just like print out the first train X and I'll just ignore this real quick so that was just like piece of text what this now outputs is like train X vectors equals that and now if we did train X vectors 0 we get a matrix that represented I'm gonna just print both of these things basically so we have this text and now this train X vectors here is a matrix that represents this text so it has ones as you can see here and all of the positions were actually it's counting so two for a specific word all the positions that are nonzero in our matrix and if you wanted to see that a more traditional way I think you good to array and this is the entire matrix but because it was seven thousand three hundred seventy-two we don't get to see the whole thing but just know that inside of this there's a one wherever this piece of text triggered something in our vectorizer if that hopefully makes sense all right and one thing that also I just find useful to know is that in this step right here we kind of did two steps at once we fit a vectorizer to this training data and then we also transformed our training data to give us these trained X vectors there's a you can do this the same way in two steps there's basically two separate functions you could use so you could do first vectorizer dot fit train X and that will do anything that will just fit the training data and then if you did train X vectors equals vectorizer dot transform train X then you will get the final same result so what I'm saying is that these two steps fit and transform in this function fit transform they're the exact same thing they just lumped it in because you so often want to fit a model then transform things they also have this kind of helpful to in one function but doing that in two steps would have been the same thing just useful to know because most every like SK line I think has like a fit transform fit and transform and usually you'll want to fit and transform it first but now actually when we like played around with getting our test X vectors so numerical representations alright of our test set text we just do we could just do vectorizer dot transform because we don't want to like fit a new model so you just transform it to text X okay so we have with this train X vectors now our final data basically is train X vectors and train Y and basically we'll want to fit a model around these so let's start looking at choosing the right model okay we have our X's and our wise now let's actually do some classification and SK learn offers a ton of different classifiers so I'm going to just kind of go through some of the options and this is a big part of like there's all these options for classifiers but unless you kind of study up on these classifiers and get a little bit of like a more theoretical understanding of them it's hard to sometimes make a decision so like here's an example classifier comparison and you can see all these different types of classifiers that are all built super easily through SK learn and what I would recommend is if you're trying to like figure out what classifier is right you know do some additional google searching watch some youtube videos on these I know like MIT OpenCourseWare the actual professor that I had when I was taking in AI class at MIT who unfortunately passed away but like if I looked up like linear SVM MIT you know you'd get all of these lectures that good information you get a more theoretical understanding so you can kind of go through and like figure out what you know get a feel for these different models and maybe have a feel for which one's gonna be better the other part of that though is you know part of it is just testing just train your model running it on the test data seeing what performs better so we'll take a couple of these fit a couple different models and see kind of how they perform so we're gonna take an SVM if I take both of these the linear kernel all that so that will be one probably do some sort of decision tree I'll throw in some sort of naive Bayes maybe we'll also do I think logistic regression regression which is not here but okay so we'll import SVM to start okay so classification start with a linear SVM and it's very straightforward to use this we can just do classifier and I'm just gonna say classifier SVM SVC support vector classifier I think it stands for oftentimes you also see support vector machine SVM so we're actually I think more generally you can do from SK learn import SVM and we can do sv m dot SVC and if I look run this and just look up the documentation here there's couple things we pass in kernel C value and yeah reading the documentation the penalty parameter C of the error term if you you know read up on the theory you'll have a little bit more idea of these different parameters it's good to know what you can play around with so we're gonna make this linear to start so we're going to say kernel equals linear but we could look at all the different options like kernels one option C is another parameter you can play with gammas another parameter all sorts of stuff so we have kernel linear and that would give us a classifier and all we have to do to fit this classifier to our X's and Y's that we defined right here there's we can do Co left SVM and this should be a familiar command to you fit train X vectors and train why so we pass in an X & Y to fit this classifier to our data there we go and then what's cool is we could go ahead and I predict something so based on all of the training reviews we had we can go ahead and do CL f sv m dot predict and real quick we need to get something to predict so maybe we just look up I'm going to copy the I'm going to print out real quick our test X 0 let's see so this is a positive reveal looks like we could also get the vector for that oops just as vectors 0 I think I defined that maybe not train - vectors so to get test X factors we can do just sort of probably to find out the error test X vectors equals vectorizer and we don't want to fit again we just want to transform because this is our test data just X okay now sorry have these text vectors defined so we have something to pass it and now what we can do with this trained SVM classifier we can go ahead and predict whether the words this review every new McCole better is better than last this is no exception we can predict if that's positive or negative we can do test X but our CLF sv m dot predict so pretty much all of these classifiers I think all of these classifiers have a predict method and then you'd pass in that same type of vector that you used to train it but we haven't actually seen this test vector so we don't know what the proper output should be but I'm saying it should be positive if our classifier was trained properly and as you see we get positive okay let's go ahead and create a couple other classifiers so I mentioned a couple that I'll try I mean there's so many options to choose from so I'll do decision tree will have a naive Bayes option we'll do a how about a logistic regression and we're just do these rapid-fire it's pretty cool how easy it is to do this okay so let's do decision tree first if we go back to that comparison thing we see decision tree classifier here I can click on that and get the full documentation for that okay just look at the example so it has a fit as a violence protective call okay so from SK learn and pour that and then base we're copying the same code as before ah no I didn't want to do it there I wanted to do it here insert so below coffee so doing decision tree real quick we're gonna want to fit that so classifier will just call a classifier decision equals citizen tree classifier and then once again like you can look up the the stocks of this and see what options you have for things to do and you kind of play around with that to do whatever is best we can do see left a decision dot fit we're going to do this train X vectors again same as I can fitting it to the same exact data and the Train Y and let's see what happens so we have a another classifier we've just trained to decision tree classifier and we could do the same type of thing let's see if it predicts it properly so as you see it's like we just built two different classifiers super quickly and this is positive too we could go naive Bayes and I'm going to keep saying this like check out read into these these different classifiers like without being just saying it's like you're just kind of randomly there's no like thought process and choosing one over the other so you kind of get a if you have some understanding behind the scenes of how these work you'll have better ability to do model selection so naive Bayes and this time we'll just like look up something out already from some different classifiers a scalar there's all sorts of documentation all sorts of different types of classifiers here so I want a naive Bayes so we'll try to just this Gaussian naive Bayes okay now kind of basically just repeat the same thing paste in this code this will be guessing that a phase Jessie Gaussian naive Bayes that's like a tongue twister Gaussian naive Bayes so what happens here same type of deal cool logistic regression would be the last one I'm just rapid-fire doing these logistic regression SK Laird how do I get you okay cool I'll call this CLF lodge because logistic and then we'll just copy this what happened okay so there's some future warnings but everything's fine everything's predicting positive that all seems good okay now that we have some trained models that's actually evaluate these a little bit more comprehensively so let's look at the entire test set and see how accurately we predict every one of those test examples it's like above we like you know seem to predict positive correctly for every one of these models that looks good at least initially so how can we do this a little bit more comprehensively we can go ahead and do that's just a quick look just to show you where you would find it a knife baby would work so what are the functions you can do with that classifier that we trained or maybe I'll just look up like SVM S Caitlyn there's a specific function I'm looking for I just want to show you where you'd find in the documentation okay score this is what it is so we pass in an X and the y and it will predict all the X's and see how it compares to Y so score is what we're looking for and if I clicked on score you would see probably how to use it in an example maybe not but you could look up score example but here's how we use it let's take our SVM classifier pass in the test X vectors that's what we want to convert and see how this predictions for this compares to our actual test y sorry SV score now it should work 82% not too bad let's see how the other ones do there was was the second one I did I think decision let's score and passing the same exact vectors so we're just quickly seeing the accuracy this score function if we look at it I could look at it here or right here it turns the mean accuracy on the given test data and labels so it sounds good so what would be we could print all of these so that's the first one the SVM printed out the second one let's just copy this two lines to get our last two okay we had the night Gaussian naive Bayes and we had a logistic regression okay let's see how all these do 82 78 78 84 okay all very good which is pretty nice we didn't do much at all but is there a catch and I'm gonna tell you right now there is a catch so we just looked up the accuracy the other metric like accuracy is one thing like how many labels did you actually predict correctly the more important metric that we usually care about as a data scientist I would say when we're doing classification tasks is the f1 score so I'm just gonna shorten this line up so this right here was the mean accuracy on all of our test labels let's look up how our f1 scores are for these same exact classifiers so to use f1 score in SK learn you can import it like this and then if I run that we can do a little look at the signature by doing shift tab on your jupiter notebook and so what does it say compute the f1 score also known as the balance of score and gives you a little information if you don't know too much about up score on how it's actually calculated it's worth just looking into that because it is a good measure so we need to pass in y true and Y predicted and optionally we can pass in some labels so labels might be helpful for us okay so we first need the pass in the Y trail so for all of our models this will be the same it'll just be test Y and then we need to pass in the predicted Y so predicted Y would be C left on SVM predict of all of those things in our test X vectors so we do it like that so let's see what our f1 score is for the linear SVM ah what happened target us multi-class but average equals binary okay we'll try Noah averaging I just want to see each individual class how good its f1 score is average equals none okay cool okay so what does the what are these numbers mean well it's a little bit clearer if I pass in labels so labels are going to be equal to sentiment dot positive sentiment and this three labels I'd notice that we actually forgot to filter out our neutral labels so now we were building a classifier to do positive neutral and negative instead of just positive or negative we might switch that I think we will probably switch that so we'll go through how to do that a neutral and sentiment dot positive or negative so I might just make things clear oh and one thing to know it is if I like switch the order here we now know which ones associated with what if I switch the order of these labels because it might not be clear otherwise I put like positive at the back you'd see the 9d switch to a different location as you see so go back though because it was more clear okay so we have f1 score of 90% for positive so it's very good for positive but it is trash for this this model is pretty trash for the neutral and negative labels so we want to improve that because yes we can predict positive but we want to equally be able to predict negative too basically right now what our model is doing is always predicting positive and often times I guess it's right based on our test data split okay let's just see does this hold true for the other models too so like try the decision tree that oh wow okay that did very good on positive again awful on neutral and negative so this might be the common trend Gaussian naive Bayes that is same thing good I'm positive bad on negative and neutral and finally logistic regression yeah it almost like looks like it's good on this but that's a point zero nine three not 0.93 okay so very good positive they all are very good paused so now we've we've discovered something how can we make our model better on not just positive examples so that's what we're gonna work on next okay when all these models perform like equally as bad on neutral and negative you know I'm not really thinking that it's a model issue right now I'm thinking it's more of a data issue so we're gonna do a little bit investigating into our data and just see if we can find anything so let's look at her training data so train X I guess yeah train ax so remember what train X was that was all the labels that we used and we ended up actually converting them to vectors think I'm more curious about is train Y so let's just look at the first like five elements okay and this already is telling this isn't our training data we have five positive things in a row we could honestly probably let's look at the entire list okay five right in the first five okay that's kind of telling we can do train Y dot count of sentiment dot positive and let's see how many actual positive labels we have 552 okay so remember we had 670 train labels total in five hundred fifty two of them are positive so right away our models are gonna be like heavily biased towards these positive labels because there's way more of them so we do the same thing for negative forty seven negative labels so and I guess the rest would be neutral so here's our issue we need to let's balance our negative and positive data so let's do that real quick because it wouldn't make sense to balance the 47 like if we balanced it exactly with 47 then we'd have 47 positive that seems like a little bit too little amount of training data so what I want you to do real quick is well you download a bigger data set if you go to my github go to data sentiment download this 10,000 file that's actually ten thousand reviews instead of dia thousand those originally in that this and if you're curious on how I made this ten thousand file the way was I used this data process file I wrote and basically looked at all the reviews in this massive file that I downloaded from this what I brushed on here so right here I downloaded that unzipped it then ran this file on it changed this final data to ten thousand took ten thousand samples randomly and then just basically change the name here so it's ten thousand examples so we should be able to get a little bit more negative examples out of that so alright so you download the download the data sentiment book small click on it view raw and I guess you could even just click and then you would save as and save it just like you saved the first file okay and now what we need to do is go ahead and instead of loading in up here books small you need to load in whatever you named it I called my book small JSON or book small ten thousand a song so I'm gonna run that I'm gonna do all the same stuff with this I'll do the same sort of vectorization now it's going to be a bigger vectorization and really when we go down here to the count of negatives we should see a higher number 436 cool that's a lot better in my opinion you know about 10x for our training data so we have more negatives but that also means that we have more positives 5611 so what we're going to do is actually create a little container class for reviews so we can like call it evenly distribute method that will even out our training data in our test so we'll do that in the same spot that we created this original data class so instead of we have this review class and now what we're gonna do is add one more class and that's going to be review container and to initialize that we will want self as always and then reviews and the reviews we already know what our self dot reviews I equals reviews and we can create a method called and I'm making this a class just because it makes things neater we'll create a method called evenly distribute it takes himself and what that will do is count the number of it will count the number of negative sentiment things and it will okay so will write this and just bear with me so we want to get all of our negative examples from the reviews so what we'll do to do that is we're gonna filter our reviews list based on what is equal to negative so this is a little bit of a so filter bear with me X dot sentiment we want all of our reviews the sentiment of our reviews to be equal to sentiment dot negative and that will be done on the entire review so self dot reviews I think that's good and I'll just print negative real quick so we can see what this is doing it basically what it's doing and I'm not gonna actually print negative it's looking at all these reviews mapping every sentiment and it's basically just filtering based on every sentiment is negative so whenever the sense was negative we're keeping track of that in the negative list here and we can do the same thing for positive equals filter I'll just copy this line and we want this to be positive I'm gonna ignore at this point we're gonna ignore the neutral examples you could factor those in if you wanted to but I'm just going to make sure that my negative values is equal to my positive values I think this should build our model make it a little bit better okay so now if I printed out the lengths of negative and positive and I could do that if I rerun this Reaver container I could pass in all my reviews somewhere down here I pass in like those review container of training and we'll just run contain our dot evenly distribute okay so we have positive and negative in here and what I'm going to just do to just see what's happening is we'll do print negative zero dot txt print negative print I guess length of negative and it'll print the length of positive okay so what happens when we run this down here filter is not subscrip to ball it is because we now need to convert this to a list when we filter something it doesn't automatically convert it to back to a list so you need to surround it with the list blocks and now what happens here cool so it gives us our texts from the negative shows us that we have 436 pieces of text and there's 5611 positive so what we're going to do is just basically shrink the amount of positive examples to be equal to the length of the negative so that is equal to positive length negative and I guess yeah okay so let's give a little length and negative and now what we're gonna do is our final reviews is going to be equal to negative plus positive shrunk so now what we're basically doing is shrinking the amount of reviews we're actually storing and just only containing the negative ones and the positive ones that are equal to the amount of the negative and just for a good measure we're gonna also import random I think it might have this might be overkill it might be doing this twice and we're gonna shuffle those reviews just make sure that our data is kind of evenly and random when the negatives and the positives come actually yet we do need to do this because we're with otherwise we'd have all the negatives and then all the positives so we're just going to shuffle up the order so that you don't know if a negative or positive is coming next some of the algorithmic models will this will be important for okay cool so we're gonna rerun that hopefully everything works and remember we imported random here too okay now if we get on to our evenly distribute and if we run that are actually that's container dot reviews and we'll just get the lengths that we'll just see if it the total number of reviews now that we're looking at are smaller and then I'll do a couple additional thing okay 872 so that sounds like exactly in the middle of you know how many negative there were times two okay so now instead of doing this we you know kind of bake a lot of this stuff into our container type now so I'm gonna just call this like trained container and we'll make a test container too and you'll see all of what I'm doing in a sec and I want to note - real quick part of the reason I'm doing this and breaking things out into classes it's not necessary but like real world this is a like a real world tutorial and it's always important to try to like keep your stuff neat and even though this is easy enough to do if we have to do this a lot of times it might get annoying so we'll bake in some of this break down into our review container class so what we can do is add a couple of new methods so we could do def get text self and all that would do is return this basically and not training I'd be actually in self dot reviews return we could get why or get our labels so get sentiment I'll say yourself that would be returned excellent sentiment for X in review self thought reviews and now what I can do is instead of doing this tree necks now it just becomes trained container dot get text and train ye becomes trained container dot get sentiment same thing for this this would be a test container dot get text and test container dot get sentiment okay and now what we can do is we could like do something like actually check in our train why the count of self dot sentiment or sorry train why the count of sentiment dot positive and the count of sentiment dot negative review container has no attribute I don't think I ran this cell again run and run and run 436 is negative let's see about positive 5611 okay didn't quite do what we wanted to do and that's because we also just need a real quick run that function though that we wrote called evenly distribute now we should be good okay now positives for 36 and negative should be the same that was a fairly long aside but yeah we got to what we're doing and now does that help us out and I don't think it's important to evenly distribute test X but you could if you wanted to it really depends on what you think your incoming data will be and like practice all right so we have this now equal so we're going to vectorize it the same way using our slimmed-down train X and train Oh actually train X vectors okay that's good test X vectors I could we could also bake this into our review container class if we wanted to but it's fine like this for now okay now we're fitting to the new data and we can fit all these to the new data and let's see what happens to our scores come on baby okay our scores decreased is it a good thing or bad thing I mean normally you'd say that's a bad thing but let's see our f1 squares how do they look and our f1 scores I mean overall they seem to get a bit better I'm not gonna look at logistic regression let's look at the SVM for now okay I mean it got definitely better it's still not great but it got better so what can we do next let's like keep getting better and better so the first thing I'm going to say we should do and you know again you as the data scientist we get it cannot control the data I'm gonna say that I want a model that kind of does a good job predicting one about half and half are positive and negative so right now in our test set we have an overwhelming amount of positive I can show you that test why account is 2767 for positive and now we have 208 for negative so why did this f1 score for negative and also we dropped a neutral out of this completely so I could actually just kind of delete that for my labels and also for the time being let's just focus on this SVM because it's hard to go through every one of these I'd say in general for this model probably logistic regression or the SVM will be your best performing okay so the reason that this is low is because we trained it equally but in our tests that we only have 208 negatives so like our model is a little bit more custom to like predict 5050 and you know it really depends on what you're looking for and there's also ways to probably make it a little bit more robust where even though you train with about 50/50 you still could handle that uncertainty but I'm going to just go ahead and start out by doing the evenly distribute for the test container as well so now we'll get about 200 for both positive and negative and everything else should stay the same here nothing changed nothing changed here all this stays the same now we just go back and let's see what happens to our f1 score now that we made it about half and half so we basically what was happening before is we probably were getting a lot of false positives if there is something that was actually like a positive comment we were predicting it negative because we're about equal like pretty good positive and negative based on it so let's see what happens now that we make this we even do that so I mean sorry this is not the model I'm running but as you can see 208 208 okay let's see what happens to our f1 score for negative it shouldn't predict I mean it should pretty negative at a higher accurate rate because it's test doesn't even know what happened I think I didn't like reset things so I was like doing some weird stuff okay what's gonna happen now come on work okay cool and look at that yeah shot way up because it wasn't predicting negative so frequently or I mean it was predicting negative as equally but our actual test data was more equally distributed and I'm saying that we wanted that all right so we got like our base model working and one thing that's probably good to do is let's do a little bit of a qualitative analysis as well so we could take a test set such as the one I showed before we could transform that using our vectorizer and that's our new and we could use the SVM to predict what our labels should be and as we see look at that positive negative negative and we get it's really kind of fun to play around with this great positive still if I said like not great I'm actually curious what this will say I think the great will outweigh it because we didn't actually use diagrams that which would count this as one thing oh wow not great well that's pretty impressive very good book positive very brilliant all sorts of words okay I guess it didn't know the word brilliant and that could probably be accounted for with more training data very fun that's good so looks pretty good to me but let's make this even better sorry I'm going insane okay and also just really quick gonna refresh all these scores since we changed up our test data and as you see they increase because of their predicting positive and negative both more accurate okay so let's drive up these scores any higher even higher and the first way we're gonna do that and then she's gonna have a feel for vectorizer x' but let's scroll back up to our vectorizer with the count vectorizer let's think of our examples that I showed when I was initially introducing backwards this book is great was one I think the other was this book was so bad okay so with the count vectorizer the main issue is that it weights each word equally even though if certain words don't have nearly as much meaning to a sentence like in this case this and great would be weighted equally as like one count while the great is the one that defines the sentiment and the this has no meaning really so instead of using count vectorizer we can do something smarter and that is use a tf-idf vectorizer and that stands for term frequency inverse document frequency so basically a term is important if it occurs a lot throughout a review just like grade only appears once but it's like yes it's as important as any of these words and it's inverse document frequency means a word is less important if it occurs in a lot of documents so for example this is was so if we have a big corpus of documents those words would appear a ton and as a result their weight would be less than great who occurs but occurs less frequently and ultimately plays more of a role into the meaning of the the document or the the review or whatever you want to call these so tf-idf allows us to wait this great and bad higher than stuff like this and book and that ultimately will help our performance so all we have to do to make this change is changes to tf-idf vectorizer and we should be able to run this we should be able to rebuild our models and oh no I updated it really quickly but I'm not mistaken by doing that we're focusing just on SVM that went up by about a percentage point it looked like so it did some damage and you probably could tweak around with that make it even better I think logistic regression actually went down but so I mean this is I mean machine learning in general like you do something maybe it increases it maybe it doesn't try to be smart about how you do things and you know you always kind of just have to play around and try things but our SVM went up okay that's cool so that's one thing we can do to increase our performance let's see what happens to our f1 score is for SVM yeah they seem to both go up for both sentiment positive and negative that's cool yeah I'm guessing these will stay the same hopefully yeah all right so we're gonna try to increase our accuracy even further and we'll be doing that through a method called grid search and to access grid search we can do from SK learned on import grid search CV and if I was like uncertain what this was called I would probably be googling like parameter tuning SK learn and you'd probably find this by doing that okay so grid first search CV and what does that do well let's go back to our SVM that SVC model that we used and I'm gonna real quick look up the docs for this so as you see there's a bunch of different things doing capacitor SCC there's the C parameter this is Colonel degree gamma all this stuff and to be honest like especially for like stuff like the C value maybe Colonel you have a better feel I don't know what values to that are going to be best for my data so is there a way I can programmatically test a lot of different options and like choose the best one for me and that's exactly where this grid search TV comes in so we'll do something called like tuned SVM equals and how do we use grid search well and I'll actually get back to that in a second comment this out real quick well what are the parameters we were looking at we'll see Colonel Gama let's just focus on the sea value and colonel value right now and remember for Colonel we were using a linear but you also use one of these so we're gonna pass in some parameters so what our parameters gonna be you're gonna be first off we'll have Colonel and for now I'm just gonna list in to Colonel options so Colonel and I'm gonna pass in this is a dictionary American Colonel to our options so we want to look at linear so that's what we're currently using and there's this other one that's the default that's radial based I believe Colonel Selby one of our parameters we can also pass in a C value right now the default is I think one as you see here but maybe one isn't the best value so we can pass in a bunch of different values you'll be passing 1 4 8 16 32 etc and basically what's gonna happen is when we use this grid search when we use this grid search so I'll say SVC equals this VM to SVC a classifier equals grid search CV of the classifier that we want to pass in so that's this right here and then we're going to pass in the parameters and then we can optionally pass in this CV value which is basically how many times it wants us to split the data up to like cross validate and make sure that things are working well with the specific parameter on our training set so now we have our classifier to find you can do classifier to fit train X train why so we're fitting it just like we fit all these other models up here or up here train X vectors I guess I have to pass a train X vectors train why okay and so now what's gonna happen is it will check the linear kernel it will look at for this linear kernel it will check all of these C values figure out what's the best option and then it will also check this RBF kernel and check all of these C values and then it will choose a kernel and a C value that it predicts will do the best on unseen data so let's fit that this sometimes will take a little while depending on how many parameters you pass in I don't actually know how long this will take we'll see all right worked okay cool so it found after doing this that C it's found that it was a good value that we put in was one was fine but actually changed our kernel value it recommended the radial based kernel so now we have a new classifier and if we went ahead and did the same scoring method that we've done on the other values and I'll do this in the next cell so we don't have to rerun this okay so remember our SVM before was 80 point seven percent accurate and now let's see what happens after we do a little bit of fine tuning oh it's about the same yes that RBF kernel didn't do much kind of coming down to the end of this end of this first model I mean there's room for improvement here a couple things you could do if you wanted to improve any of these models further specifically and just focus on this SVM that we just fit I guess this is really pretty general for all of them think about how our texts what our texts look like like one thing you could potentially do is just strip out any of like the code words maybe that would help a bit you could one thing I noticed that might be a problem is interest in sir so is that words like good and good they probably would treat this I think by default you know you're not stripping out punctuation so like good explanation and good would be treated as different words it's like one thing you could do to make your bottle better to strip out all of the excellent punctuation marks what else can you do you could look at exploring more complex things than just doing bag of words and tf-idf vectorization there's a lot of cool state-of-the-art language models that are out there there's open a eyes GPT that maybe you could do something funny cool with there's google's Burt that's also good those are both topics for maybe future videos but this also isn't a natural language processing video so I'm not gonna cover the future areas so let's talk about saving the model okay so we have this classifier right here that we want to save so that we don't have to retrain it the next time we want to use it well we can do that very easily with this library called pickle so if you should I think have pickle maybe a bit of default but if you don't you could do a pip install a pickle and get it and basically what we can do with this pickle library is I can do with open I've created some direct or they might see where I am one sec and you can actually so in my SK learn directory you have this models and I'm gonna save a model here so models slash sentiment classifier dot pickle that's what we're going to save our classifier as so with open Santa Claus pickle and we need to write there so we're gonna open a writing buffer as f we can do pickle dot dump CL F to F so this is taking our classifier that we were using up here basically the SVM but with some tune parameters that ended up being basically the same parameters giving us the same result we have this and we're dumping it into all the parameters in here into this file so I can run that now what's cool is that I can go ahead and even if the CLF wasn't defined or if I wanted to find out something else I can just simply load in the model so I could go ahead and do something like so I could do I need to open that file so I can just copy this open and now this time I'm reading it so I should read buffer as f we want to do classifier so I'm just gonna call loaded CLF and just so you're clear if I was trying to do loaded CL f dot predict now like this would not be defined so it wouldn't work loaded CL f equals pickle dot load and I just need to load that file do that now I can do loaded see a left predict test X vector so just just the first one what is the first one test X 0 as you see it did output something so like we have a review that looks like this and without doing anything before we did enough of a train a model we were able to just load this pickled file and use it just as we were using the classifier before after the training so that's pretty cool it's very very useful because if you're training these models you want to be able to say if you want to be able to use them in production and using this pickle by dumping your trained models and then reloading them that's how you're gonna do this because this tutorial is getting pretty long now that we're done through everything I'm gonna kind of skip over the category classifier I'll go through it really quickly it is on my github so if you want to see this exact details check there so if you go to SK learn keith gaya SK learn and you go into the category classifier you could download this too you'll see that it's basically the same thing overall is what we did for a couple differences like I added an enum class for categories basically then we go through and load in data from all these different files and those all that data is can easily be found through data slash category you download all of these files and then so it look goes through each one of these files loads and all the data associated with that file and then based on which file it loaded from it sets a specific category and that can be seen here too that it has the reviews the same way preps the data pretty similarly one thing you'll notice is that evenly distribute is crossed out because for this one because we're looking at just which category so electronics or clothing or grocery that some review comes from we don't care whether the positive and negative stuff is evenly distributed at least I don't think it would be that important to we use the tf-idf vectorize just like before here's some classification stuff pretty similar f1 score does actually very well as you can see with here all these are like hitting pretty accurately across the board so that's pretty impressive to see I do the grid search again that honestly just gets you back to even though you had all these options to choose from it gets you back to C equals one and kernel equals our bf but it did check all these finally like save the pickle file and also save the vectorizer in this case because it's nice basically when you want to do tests like this like quick kind of qualitative tests you want to use the vectorizer as well so figured I'd save the vectorizer as well those pickle files you can download from my github as well they're in the models directory then the last thing I did here which was kind of cool just because it fit well was I did a little bit of a confusion matrix so you can look at this code basically as you see stuff across this diagonal is so on the upper axis is enough to double check this actually but one of these is the Y predictions and one of these is the y actuals so as you see it's very wide prediction y actual are very close but baby sometimes you know you mistake clothing for electronics your mistake books for grocery like there are mistakes and that's what the confusion matrix shows but you could use this code if you want to replicate some confusion matrix type of stuff all right that's all we're going to do in this video thank you guys for watching if you enjoyed this video make sure to throw it a big thumbs up also mean a lot to me if you don't mind subscribing really just seeing more subscribers motivates me to make more videos also check out my socials my Instagram Twitter and then final thing I wanted to say is if you like this dope shirt that I'm wearing you should definitely check out my buddy Sebastian his Instagram page where he makes all this cool content is right here and I'll also link it in the description thanks again guys peace out [Music]
Info
Channel: Keith Galli
Views: 125,945
Rating: 4.9718103 out of 5
Keywords: Keith Galli, MIT, sklearn, python machine learning, nlp, machine learning project, artificial intelligence, sci kit learn, sci-kit learn, AI, python 3, jupyter notebook, data science, ML, python data science, model selection, classification, regression, algorithms, sklearn overview, machine learning in python, python programming, programming, advanced, simple, complete, save model, confusion matrix, python plotting, sentiment, natural language processing, project, machine learning
Id: M9Itm95JzL0
Channel Id: undefined
Length: 100min 48sec (6048 seconds)
Published: Mon Sep 30 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.