Google Colab + Kaggle - Downloading Datasets & Uploading Submissions from a Notebook

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi everyone in this video i will talk about how to use kaggle's api to check available data sets download data sets and upload submissions to gaggle all within a google colab notebook before i get started i will briefly give a background on kaggle kaggle is a website where people can go to learn about data and computer science kaggle's platform offers various data sets that you can access for free as well as competitions that you can enter to compete for fun or for prizes kaggle is owned by google and over time i think they will continue to make it easier and easier to work with both google collab and kaggle great so let's get started the first thing that we need to do is generate a api token in order to work with kaggle within our notebook so i've kaggle up and the first thing you need to do is go to your profile and you can click click on your profile link right at the right hand corner go to account and under api you should see create new api token that will download and you'll have a json file that you can upload and there are a few different ways that you can upload the file if you have it up you can just go to your downloads and drag and drop or another way that we can go about doing this is just to give another example you could go to files dot upload to upload this from your download section and in order to do this you need to also import files from google.colab if i hit shift enter you choose a file and kaggle that's in my downloads and great we have that up and we can keep moving on so in order to work with kaggle we'll need to work with some shell commands and anytime we want to use shell commands within our notebook we can use the exclamation character in google collab so the first shell command we'll run is mkdir or mkdr and this will create a folder for us to store our json file okay then the next command we want to run is cp and the cp command is going to copy the file into the new folder that we're going to create finally we will call the chmod 600 to allow us to read and write to the file great so we should be all set so the first thing i'll show everyone how to do is take a look at the current competitions that are going on within kaggle so in order to run these again anytime you want to run a shell command you have to start with the exclamation point and we'll just type out the competitions list great so now we have a list of all the current competitions that are going on so these are the references this is the deadline if you ever see a deadline that goes out to 2030 that just most likely means that there is no deadline it's indefinite for anyone familiar with kaggle you'll know that the titanic titanic one is most is the first competition that most people are introduced to and that doesn't have a deadline and then we have a bunch of other ones and it tells you the category so a lot of these for getting started that's obviously if you're new playground featured research analytics and that's probably rounds it out research and analytics will probably be some of the harder ones then it'll tell you if there are any rewards some of them have prizes others are just for fun so knowledge there's no prizes or cash prizes for those and then finally you have a few that have some cash prizes for the top winners and then if we look at the last two there's team count so the number of teams that are registered in this in each of these competitions and then finally user has entered so that tells you if you have entered so you can see that i did the titanic one and i believe that i did this one as well great so the next thing i'll show you is how to take a look at all available data sets so not all of the data sets in kaggle are linked to competitions some of them are past competitions and some of them are just data sets that are available to people that want to just analyze the data or look into it or use it for their own research and the way that we do this is we'll call kaggle again and we'll look at the data sets and lists okay and similar to the previous one we have the reference id then we have the title of the data set so pfizer vaccine tweets credit card customers california traffic collision next we have the actual size of the data set so some of these are pretty small and then some of them are relatively large this one's for one gigabyte most of them are a few megabytes or smaller than that and this next column last updated just the last time that the data set was updated or uploaded into kaggle and finally download count how many times taglers have downloaded this particular data set okay great next let's move let's work on actually downloading some of this data so i'll download a competition data set and i'll download just a regular data set so let me first download the competition one call kaggle then we need to specify competitions download dash c and i just want to download the titanic one so this is the one that everybody gets started with almost everybody gets started with when they first start so we'll just download that one okay and it looks like it's downloaded and the way you can check is you just go to your file section and this is for the titanic data set so we were able to download those in and before i actually show you that data i'll just also download a data set from the data sets column okay um i will probably do a smaller one okay i will just select i'll do this one on chess okay so similarly we call kaggle again we want to download and we need to type out the reference name great and it looks like that one went through let's check yep okay and just to show everybody what this looks like i will read these into a pandas data frame so first i'll do and we'll check the chest one so copy path okay and if i scroll up so this is all the information from the chess one various ratings okay great and let's also check the titanic data set too so this is the training data set that you would use when you're building your model great and that looks like it went through anybody that's familiar you'll see that this is the famous titanic data set great the final function that i want to show everybody is how to upload results so when i downloaded this kaggle usually provides a sub a example of a submission so i can just use this as the submission for the kaggle data set and sorry for the titanic data set and upload it so what i'll do is i will call kaggle again if the say that we're submitting we have to say which competition that we're submitting for we have to provide the file and you can just copy this from your [Music] file here and then finally any comments so i'll just put in test submission okay and i will run this okay looks like we were able to successfully submit that and you can always check i can go to compete all right and i just submitted it and it gives me the score and again this is just the one the generic one that they provided and we were able to submit all this okay so thank you for tuning in hopefully that was helpful if you have any questions or comments about working with google collab and kaggle feel free to leave a comment you can always also uh connect with me over twitter linkedin or github happy to answer any questions and feel free to subscribe if you found this helpful thank you everyone and happy coding
Info
Channel: Adrian Dolinay
Views: 2,872
Rating: 4.9365077 out of 5
Keywords: kaggle, Kaggle, Google Colab, Colab, Jupyter Environment, Data Science Competition, Kaggle API, Kaggle API Token, Kaggle competitions, Kaggle dataset, Kaggle data set
Id: m-As6o-SLtI
Channel Id: undefined
Length: 12min 16sec (736 seconds)
Published: Tue Jan 05 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.