First hour with a Kaggle Challenge

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
what's going on everybody I hope everybody is being safe smart and socially distant I want to today draw your attention to the following kaggle challenge obviously highly relevant to today but also I find this to appear to be much more like what the types of tasks at least for me that I have found businesses to need when people come to me for like contracting and consulting the data tends to look more like this than your average Kaggle competition now nothing against cago competitions there's definitely true competition there but generally kaggle comes down to a competition of optimization rather than a competition of finding insights in highly highly unstructured data now sometimes that's wrong but for the most part that's kind of what you find on kaggle because Kol has to be objective in enforcing that objective objectivity there's people that will optimize for those objective results so anyways enough on that this is a very interesting data set I've downloaded it and kind of peeked at it and that's kind of why I decided no this would be really good to do a video on I've just barely peeked at the data and then we're gonna dive in via code and you will see kind of firsthand the the process that I at least begin with looking through a data set that this kind of data set though I mean you will spend you know 20 to hundreds of hours you know going through here trying to get insights so probably here will at best we'll see like the first hour so don't expect anything too crazy but this is just kind of my process at least so to get the data just go ahead and make an account you can download the data also they have various tasks here we'll talk about this in a little bit but as you've likely found like the problem here is there's so much information like if we look at data it says there's twenty nine thousand articles and then thirteen thousand or full-text and these are scholarly articles just to make that clear which is important and there's so much information and so much new in from it's just being pushed out very fast there's historical information and the new information and it's very difficult so as you've probably found yourself like trying to learn about this is the variability in the information and facts that you might hear is really high and then not only that the variability but there's also just there's so many like little details that are just really difficult and and so in this case with the you know that there's the critical and acute nature of what we're going through here it's making it very hard normally it really would be like grad students that would sit and go through all this stuff but we we just do not have time for that you know and then businesses for example like they it's either you or they're gonna hire some interns or something to do to try to answer these questions but it turns out programming can often do a better job at it so anyway let's dig in we'll talk about these tasks in a moment but first start the download of data and then we're going to start going through the actual data once it's downloaded you can either pause or just as your download is going let's look at my my file so you'll get a you know a zip you'll unzip it and then you'll get this basically so the last update was March 13th it is March 19th at the time of my recording this but so hopefully at some point maybe they'll make an update but we'll see so coming into here this is what we get we get these four directories initially I kind of expected this to be the full thing right and then this was like the commercial subset non-commercial subset and then some sort of custom license subset but then I click on here I see 803 items and then I click on this one and I see 9,000 items this one has almost 2,000 items and the custom license as 1400 so that tells me right away no we're gonna have to go through all of these directories and then inside the directory we have another directory first of all just keep this in mind with the exact same name as the parent directory so who knows why going on oh but then we have these JSON files so now let's just look at a JSON file so we know what we're looking at so already the JSON I'm like well that's not what I expected they expect to be in JSON I mean maybe it's coming from sort of jate some sort of JSON based database I don't know anyway so what we have here is clearly just your your typical you know keys and values scrolling down okay we've the first like body of text that we found at all is called abstract so we can grab the abstract okay noted then we truly have body text and then we can already see quite quickly it's in chunks of text so you've got body text and then text and then maybe some information in this case there's really no information in their site spans reference spans in the section I guess I don't know so yeah clearly we get chunks of text and then here we definitely do have some sites banned stuff I'm not really sure what this is this is that like you know character placement I've no idea no clue but clearly we're looking for the chunks of text inside of body text okay okay so I think we have a general idea now that's about as much as I want to pour over the actual JSON files I think we're ready to begin coding so if you haven't fully downloaded the data yet now would be a good time to pause so the first thing I'm going to do is make a file so Nano Crudup PI and we'll just say start and go open with sublime I need to set sublime as my default letter so right away let's just start it will take time to go through like each of these directories so let's just start with this top directory so I'm just gonna grab the name for now later we'll go through all for probably so we can get all of the data possible but we're gonna say ders equals and we'll just make this a list whoops there we go and then immediately what we want to do is for I don't know D Enders we want to let's go ahead an import of s let's just make sure we get all the files so the first thing we want to do is like you know their way through all the files so for D endures and then basically we want to do for file in OS dot list der and Windows F strings here apparently because you know some people have said that it won't like a forward slash wouldn't work on Windows but from my history of working on Windows for many years I have found that even if strings on Windows did work with a forward slash so someone please feel free to correct me if I'm wrong like if you're on Windows and that's not working and you're on like Python you know three seven or whatever yeah I guess let me know but it should work hope that's not a boon to having a problem hopefully we're still recording looks good we'll find out cool oh so we know that it was the directory slash directory so then we just want to go through the files so the first thing I'm gonna do is let's just go through the print file and let's just see I also don't have everything correctly set up yet on this machine this is a relatively new machine for me so I'm gonna run everything from the terminal way just make sure Python is so Python for me points 23.7 Python when the file cool look at all those okay so we are pointing we're able to find our JSON files very good the next order of business is to open one so let's go ahead and import chase and what we want to do is I'll path will be I guess it would just be this right here so just copy paste slash file cool so then one let's say J equals JSON load and we want to load open file path with the intention to our be cool now let's print J and let us issue a break so we don't get too crazy but we just make sure things are working as expected up to this point and it they are so so we've got our JSON now let's do a quick for P in J pranky let's just get a general idea of the keys in this document so we got paper ID metadata abstract body text and so on so let us know what's good for in a couple of things so let's print J metadata we already kind of I think I know what body text is already gonna look like that kind of looks pretty clear to me metadata appears to be quite a bit of things so you'll get the title this apparently has no title authors oh gosh I almost won okay let's do for K in J metadata it's almost would be nicer in a notebook format didn't really plan for that okay so really just title authors really title that's a lot of stuff for there to just be title and authors but okay sure okay so what we're gonna say is title equals J metadata title and then we also had abstract was equal to J abstract and then the next thing that we really want to do is let's go forward let's print abstract real quick I'm sure I need to leave this up I'm gonna close these two because that's gonna be annoying to keep going doing that every time Oh interesting so app abstract appears to be in a list format which is odd I'm not really sure what to think about abstract being a list but would there ever be two absent papers but having written all zero papers in my life but even in college I'm pretty sure there was only one abstract but okay so interestingly abstract zero so the abstract needs to be abstract it's zero I guess let's stop breaking and stop printing and let's run through everything to just see No okay so sometimes we don't have an abstract so let's try abstract except we're just trying I'm trying to see is there ever a time grant where it's not a list because that does feel weird but surely like I'm my expectation is this came from a JSON based database so my expectation is that everything will have the same format but I could be totally wrong J abstract so my thought here is that this will be an empty list when we hit that exception that's the only thing we're in a print so hopefully we'll see a bunch of empty lists yeah cool okay so except so in the case that there isn't an abstraction we're just going to say or abstract we're gonna say abstract is empty just empty string okay so all right so we've got the abstract the title and now we need to grab our body text so we're going to say full text equals an empty string for now and then for text in where J body texts body text for text and jaebum okay print texts let's continue our break because now we're going to just just try to see like are we getting what we expected here for text so we needed to say text text I guess so we call text for text MJ by text text text me okay what happened there I said shoot by the way um okay yeah that looks more like what we expect so so now what we want to do is we're gonna append this to full text so I'm just gonna say full text plus equals text text plus new line new line beautiful at the very end let's print full text and let's stop printing there and let me see now well we're looking at okay yeah okay so now we finally after what has it been 13 minutes 14 minutes we've got the data in the format that we expected to get the data yeah but again this is more likely to be the case of how even text data if you're gonna get it from like for example I've had like even retailers and then certain news companies and trying to think of what else I've seen something similar to this but it is data typically in a format that you did not expect it's either CSV and or JSON and not just like text documents you might think or even crazier I've seen quite a few things I've seen some stuff y'all okay so okay so we've got things organized to an extent and then I guess the next thing I would do is organize it into I guess it depends on what our goal is so so we've gone this far and now our job is to extract something meaningful from the data so we've kind of structured it to some extent or at least we're close to being able to structure it so now coming back here in going to tasks you know I'm looking through this list and I thought about it a little bit initially but the I think you know the the problem with with extracting meaning from text as you you want to start with the lowest hanging fruit first and in in in this case when we're trying to extract from text this is so heavily an NLP or natural language processing type of task that in order for us to mine this data we need to we need to know what we're looking for right if you don't know what you're looking for you can't find it so the first thing is how do we find any of these things so for example many of these terms you know you might think well you could just search for pharmaceutical non pharmaceutical intervention but many times the idea or some you know somebody talking about a non pharmaceutical intervention they're gonna say it in such a way that is not non pharmaceutical intervention there gonna call it something else right and they're gonna describe it in such a way now one thing is like like vaccine the term vaccine will probably always be called vaccine okay another one would be like antiviral right so antiviral is probably always going to be called an antiviral okay similarly looking up here like transmission incubation and environmental stability some of these are you know like me personally I'm more curious about things like you know how is it really transmitted are we talking are we talking three feet six feet how long does it live on surfaces but then even like you look at surfaces let's say environmental stability you got questions of like hard surfaces porous surfaces like clothing and so on in liquid form or liquids rather what it gets very complicated really quick so what we're trying to figure out is something that does not hopefully get too complicated too fast figure that out and then we can start trying to tackle some of these more ciao so one of the other terms that I think is highly unlikely to be changing in in at least scholarly journals about this is incue bation the term incue bation like my tag and the short keeps flipping up and it's driving me nuts um the term incue bation is likely to always be called incue bation in a scholarly journal or text or whatever in research so my intuition or expectation is that we can use incue bation as a starting point because we can search this document for incubation and hopefully very close to where the word incubation is used we can search for a duration right so numbers so my expectation again is that we could probably do a really basic regular expression and then later we could really ramp up that regular expression to find many more examples and then hopefully filter out any mistakes but chances are numbers around the term incubation my guess is that these numbers are going to be incubation times in the form of hours or probably most likely days so again with a lot of these things we're looking at so many variables like in incubation with depending on what we're looking at we're looking at something that could be like minutes hours you know days weeks months like who knows right so so you know incubation is a general term but in this case I think we could probably search for digits and days and that's my expectation so that's what I'm going to approach with this data set first so let's begin okay so we're gonna look for incubation the way that we can look what there's a couple of different things that we could do one option we really do have is from this point and from full text right here in line you know I swear I hit my tab on must hit the caps lock I don't know anyway we could begin searching right now even here for text like we could just search on this loop but not all tasks will be that easy for us I don't think so instead what I'd like to do is build the dataframe so first let's just do dots I'm gonna make that a list I'm gonna come up here I'm gonna import support Hannah's PD I think it was suggested and I just ignore it and we're gonna say here Doc's dot append and they're going to append a list and that will be type title title abstract I guess it doesn't really matter the order I suppose and then full text okay I'm gonna comment that out also just so we know where we are I'm going to from tqd PGDM import tqd m and this is just a nice way to make a progress bar if you want if you don't have this just pip install TQ DMS anything with pandas pip install pandas for file in so one thing I want to do is let's go ahead and print D and then here we're gonna say TQ diem around this OS dot list d'oeuvre eventually it will have multiple directories cool so now we'll stop the break and let's just run that first see where we stand and yeah we'll go from there okay that's quite fast well that went quick I was quicker than I thought okay so now what we can do is we've imported PD and prepares SPD so now we're going to say is DF equals PE data frame data frame title cased and we really could just convert Docs and then we're gonna give it the columns and this is just a list of column names so we're going to say literally title abstract and then full text cool so now we have our data frame and just to make sure that things are working as expected let's go ahead and output the head just to see and things look good okay good so now what we can actually use is the filtration methodology that pandas gives us so one option it was like we can make incubation and we can say that is equal to D F where D F full text full text dot contains I think this will work incubation we will find out if that works prints incubation head so incubation goes D F where the D F full text contains the string incubation let's run a series object has no attribute contains series object so maybe we have to convert it to a string so let's say 'don't spur whoa okay this is going well uh okay so now we have a data frame that only content consists of body text what actually whole articles that have somewhere in them something about incubation we also could have filtered by like title or something but that went so fast I don't I just don't think it's necessary okay so now that we've done that I would say we can pull out just the text at this point later if you wanted to be able to cite sources or something you know we have the information necessary and just for the record I mean at least that went very quick now that was eight hundred out of what thirteen thousand four articles so things will take thirteen times longer at some point but you know some it just depends on the size of data so in our case we're dealing with two gigabytes total of data so we can we can go pretty fast and loose with our rnd here whereas historically one thing that I have learned is if you are definitely working with a very large data set you're kind of this is what I would describe is kind of a pre-processing data in a pre-processing step it is wise to save so in this case we're just not saving very much when I'm we're not going too hard here because we can iterate very quickly but normally my best suggestion ever is it's like you're dealing with like a terabyte of information uh in your pre-processing steps save as much as you can because like in the data frame for example we can at any point we can output this data frame so that so like the time it takes to do like any sort of logic and stuff you only need to necessarily do that one time if it's a big data set in this case it's really not that big so we can kind of goof off but it does I'm just some extent ok so now what we're gonna say is we'll take texts texting's text equals incubation body text dot values and now we can begin to iterate over texts so for T in texts print T book and come over here yeah I guess we can well just posish by the time I figure out how exactly I want to run it it will be too late anyways body type Oh what is the issue full text not what it is beautiful noises that cut makes oh I hope it captured that beautiful ASMR for you guys uh full text so let's try that again and see if that works good okay so how are we gonna do what we think we want to do so I um I think probably a valid method here is gonna be to just simply split by space I'm not space um period so I think we can get away with splitting everything by a period here and then look in that exact sentence for incubation and if we find incubation in that sentence look for a duration which is digits so and there will be many problems with that hopefully I'll remember to address them but obviously some of these things are going to probably do things like compare incubation times or it might be it might not even be a comparison like maybe maybe in one sentence it's it says here's the here's one incubation time and then it's like compare and then in the next sentence it's like well compared to above one it's a whole it's a completely different even sentence so there are many kind of gotchas that you might find in this case that you're gonna have to eventually probably figure out some way of detecting when that's the case so what I would do is I would search for related diseases that you expect could be compared to and then see what is that being talked about about anywhere around where we're about to pull an incubation time and if it is forget about it but for now we're gonna keep things very simple we're gonna iterate over text and then what we're gonna say is for sentence in T dot split by period space so we don't split by like decimals alright let us print let's print print sentence but in fact bro let's say if if if incubation in sentence print the sentence and actually you rather than breaking let's print a few and just kind of see what we're dealing with so you don't split copy I'm a little confused how we're seeing how are we seeing that looks like a lot more text than I would expect to be seeing per split for sentence 14 texts t was the full text right yeah for sentence in T dot split by huh my dog's Birkin hmm hmm why aren't we seeing that that is odd I would not expect to might have to pause for my dogs going crazy this is bad timing because what I don't understand is okay I'm gonna go figure out what my dogs are going on about I'll be back okay to be honest I have no idea I think we maybe got a package I'm not really sure but don't know anyway continuing along yeah we're still get this like full text converting them to values dot values I believe that's what I want values um I'm just a little confused if incue bation in sentence print sentence t dot split t in texts i am slightly confused so but print text i just i'm expecting to not see you know we've got many example like this should have been split why was that not split I don't know I'm obviously missing something unbelievably obvious here text for team text print T that's our oh my gosh was I always printing T are you kidding me this isn't is that really it is that I can't even remember now if that was there oh that's terrible if that's really it I'm gonna be oh dear I guess I'll leave that in but that's that's unfortunate that's unfortunate hmm okay so dude that's just embarrassing anyway okay so we have incubation time blah blah blah we assumed the incubation time could not exceed 30 days okay very interesting um 48 hours so like we're gonna find stuff like that I don't know that's not very good the average incubation period was seven days okay so that's an example of something we're looking for estimated the incubation 5.2 days ok so that's another example that we would look for the this article selects the incubation period as seven days okay so you get the idea I think we can look for just to start up and look for examples of digit some digit space days right we can do that very simply so and then if we find I would also say we will only accept examples where we find incubation and then one digit space days example and then the sentence has to be done so for example and the reason why I want to do that is like here so we say ok the incubation period is seven days blah blah they keep talking yak yak in latent persons seven days ago now this could be this could be anything right it doesn't necessarily like this could be some other number of days I think this is still in reference to why it's seven days but I you might have scenarios where you have these like two variations so if there's more than if we've you know if our logic is hey this is how we're gonna find incubation times and then we find two digits now in some cases it could be a digit you know incubation time of seven to ten days or 7-10 days we might find scenarios like that but we might also find scenarios like this where it's not they aren't related to each other so if we find that scenario and we find more than two instances then our logic is not solid and we're just going to toss it so so let's just start with something super basic because that's what we need if I can't even figure out how to split text and not lose it okay if incubation in sentence print sentence cool we're gonna use a regular expression so let's say import or don't worry I'm no regular expression expert as I'm sure you will soon see let's say single day and we're gonna say that is equal to a readout find all and its pattern and then the thing so we are going to look for some sort of regular expression in the sentence so so the first single day expression that we are gonna hunt for is going to be some that is really wanted to so like one you know either single digits or double digits followed by a space followed by at least da why because sometimes that'll be ours and stuff such so we definitely need to be looking very specifically for day so single day redefined all blah blah okay so if if Len single day equals one then we're assuming this is a good find so let's go ahead and rope let's go ahead and print a single day zero and let's print the sentence and then we'll tab over these just for a nice beautiful formatting and then no that's fine we will see a few examples what is going on okay so it's a two day seven day I'll come over printing this sentence here why do I keep getting bitten by for getting old prints I really don't appreciate ah beautiful eighty nine day no they said jet where's the 89th even coming oh my gosh should we say but okay here's a question why did it find point eight nine oh because then it was followed by check okay so what we're gonna say spa take that let's try that one were you at 89 days gotcha as soon as this 21 is 21 susceptible compartment return okay oh the matte so they do think 21s is the maximum incubation we've got 14 day at least 14 days and we stress that 17 to 14 10 days here so possible okay okay okay cool so okay that's a good find as you can see we did see various things like 2 to 14 days also how come you didn't find the 2 and then throw this away I don't really know 14 day oh because it was 14 day copy that okay mm-hmm like I said regular expression expert uh so so as you can see already like we would probably want to search this and this and then also search for 2 to 14 days to - 14 so many things that we'd want to search for we'll keep it nice and simple for now however so if we find a single day what do we want to do from there the other thing I bet we're missing is like you know these these are keeping it two whole numbers like seven but chances are a lot of articles have decimals as well I don't think I could try my hand at a possible decimal let's let's try it hey let's do it let's show the people how terrible of a programmer I can be at times so let's say we wanted to have a possible decimal that means you would have a slash day a digit rather thought you know one two - one two - followed by a period and this whole thing we would have zero or one occurrence of and then another digit right let's see what happens let's see five dots okay so that's problem five oh why didn't it find five got to rule isn't that can tie in case that I thought I could get me I thought I could encase it with parentheses is it brackets maybe I wish I could remember uh cuz we really want all that together otherwise we would get up to four digits and we don't want four digits we want that is it maybe the parentheses what I screwed up maybe he is it a bracket yeah you're already congratulations you already seen how terrible I can be at times no I don't think a bracket is what we want someone comment below remind me the basics of regular expressions because I want this whole thing possibly but for now let's keep it simple like I said yeah you got to see how terrible I can be because we want those two things possibly and I just don't know there's got to be some way to I thought it was with parentheses but I think I'm wrong anyway as we saw because you'll get like that leading number but not the following number and I think that's because the parentheses like picks that part to find like I think the parentheses means you're gonna find you'll find what's in the parentheses and this stuff still has to be found but you'll only find like it's only gonna return what's in the parentheses when instead yeah I want that group I just don't know I'm not not good enough my apologies someone comment one below and we'll get it somebody will do it okay so we have single day great so now what we want to do is let's say incue bation times will make that a list and then we will append let's see how far we are in time here another ten minutes I think I stopped at 25 because of my lovely animals incubation times so what we're gonna do is if we do find it I think I was pretty happy with what you know this regular expression did for us so just wanna make sure I didn't screw up when I removed our other it looks good we will just add them to this list so what we'll say is incue bation times dot append and while these are int but later we might find floats given somebody who knows how to write regular expressions but are there moi will say float single day zero so again these should all be actual integers at this stage but I think eventually you'd want to have the possibility for a float the other thing we could use like if I was doing this for a client you know and I really wanted to know the answer to that question and I could not figure out how to write a freaking regular expression I would legit just single day int single day float that's the kind of programmer I am but for now we'll just do it this way and cool we don't need to print out anymore that's kind of pointless very good very good at the very end let's print let's put both incubation times am just curious what is the Len of our land of our incubation times beautiful save rerun this where where did we forget a really show me again line 62 oh okay we never closed off that parenthesis I see again could not of course append float let's split by space we'll just say numb equals a single day zero so we're basically we're trying to just grab the number from and I actually think you could split by day I think you can convert space isn't a number two just a number I think you get away with that but we could split by space let's just do not split by space print num break so in every instance it is the first if element as I was hoping but I just want to make sure so num one fantastic just clean this up a little bit okay so already we have 71 examples of incubation times and it looks like our largest is like 42 maybe yeah looks like that's the biggest one so then what we could do is slowly begin to possibly wrap this one up import let's see import mat plot map port map lot lived up hi plot as PLT and then let's go ahead and have style from Matt plot live we're going to import some style we going use some style and we will come down and we will say PLT dot hist and remind me yeah okay so the the array and then the bins so it will be incubation times and then we'll do bins ten PLT dot show the y axes PLT dot why was it label yeah beautiful we're gonna say counts I guess bin counts PLT dot X label will be incubation time days beautiful beautiful we are coming down the homestretch there we go we got a nice histogram of projected incubation time again we probably caught some things that we should not have caught here so we still have a lot of work to do but I think I'll show one more thing and that is well a couple things one do we have just quickly import numpy as NP and can we get away with French the mean objected incubation time what is it NP mean incubation ups queue Batian times let's run that okay about ten days will add days to that as well so yeah that is the projected kind of average okay so possibly extracted some meaning there days okay finally we only went through that one set of files so let us add the others and hopefully with her all organized the same way we're gonna find out calm you subset mm-hmm let's grab the non-commercial use subset and finally we will ground but the custom license copy-pasta save let's try again hopefully that doesn't take forever it looks like it very well might take forever how come those first eight hundred go so fast and then this one go so slow well the good news is I'm already editing this so maybe I'll just kind of speed it up and make a cut because I already have to edit this because of my lovely animals um while we wait on that let me think if there's anything else I really need to talk about I'll edit this whole thing out if if there isn't um yeah yeah I guess one thing I would do is maybe come check out some of these kernels to be honest um like I don't know what a lot of these things are so understanding paper with text analytics like there's probably a lot of really great ideas here's even a rule set incubation period let's see what they've done it looks like they've gone a little harder on on making their matches let's see what they found what did you find interval mat room yeah so that it looks like he built like something really specific term match our interval match her period manager interesting interesting what you find looks like this is an incubation period um oh well here's ours how much data was that let's go look real quick so we got 562 with the mean projected incubation time of nine point one one and what was this ten bins I think it was ten bins one thing we definitely should have done is saved this array because it takes so long to create that array or list rather whoops cuz how many bins did you go with bro what did he do PLT - oh he just made a bar graph he has lots of bins anyway okay so um yeah so we can see you know on average somewhere between zero in ten days is like the most but then if you were to like stack up all these bars here you know get to like 18 or whatever this number would be it would be pretty high so some of it up to like 20 days looks to be not a shocking number now of course I would the next one of the next things I would do is try to figure out what's going on here why is there something there also let's uh even though we didn't save it as an object we can cheat like I wonder if we give it more bins I do want to kind of see a few more bin so what I'm gonna do come on down here and I'm going to say incue bation x equals pasta I'm proud of myself for just copying it let's change this to 30 bins save one more attempt okay so we make more bins okay so we can clearly see like these I might even so like if as I look further and try to fix this program I would look for each of these and see what what did we get that gave us these because I don't think those are real incubation times so we'd want to you know figure out where those came from to determine you know how can we better improve our script but yeah it looks like somewhere between five and five in I don't knows what is that seven five and six days probably five and seven where's the mouse a seven looks like the incubation time is probably five to seven days on expected average but then you know like I said looking before if we took this one stacked this on top here stack this over time here it clearly you know more than ten maybe even up to two weeks yeah so anyway okay so some starting information I'm sure we made lots of you know incorrect polls we are missing a lot of data because I'm sure many days we're using decimals and I just I'm just not smart enough to make sure their expressions apparently I honestly just have not done regular expressions and it might even be years now like I just haven't had to deal with them I usually get Daniel to write them so anyway yeah so so interesting initial insights we could keep going with this and keep trying to find more examples and stuff I'm sure there's much more incubation stuff in there that we could pull from but then using kind of a similar methodology we could start to be curious about some of the other tasks that are on here maybe you've got an idea of something you want to look for or maybe you want to do something totally separate from what I've done here but on the data set anyway it's a cool data set and it's obviously an important data set and I think it's a realistic data set I mean it's as real as this is a real problem that we're actually experiencing right now and I mean just just the data set in general this is a data mining problem so anyway I think that's all if you've got questions comments suggestions concerns whatever feel free to leave them below if you've got something cool you can feel free to link it below I'll check it out also usually on on kaggle like I said like I would look through some of the kernels maybe participate some of the discussions and stuff and probably learn some really interesting things so yeah that's all for now I hope you guys are staying safe and I will see you guys in another video
Info
Channel: sentdex
Views: 111,392
Rating: 4.9539928 out of 5
Keywords:
Id: S6GVXk6kbcs
Channel Id: undefined
Length: 50min 20sec (3020 seconds)
Published: Sat Mar 21 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.