Python Scrapy Tutorial - 12 - Item containers ( Storing scraped data )

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
all righty guys in the previous video we extracted the data of courts and authors so in this video we are going to be learning how to put that extracted data in containers called as items now why exactly do we need to put them in containers because they have already extracted the data can't we just put them in some kind of a database directly the answer is yes obviously you can but there might be a few problems when you are storing the data directly inside the database when you are working on big / multiple projects so scrappy spiders can return the extracted data as Python dictionaries which we have already been doing right now without quotes project but the problem with Python dictionaries is that it lacks structure it is easy to make a typo in a field name or return inconsistent data especially in a larger project with many spiders because our koats project is right now very very small that's why you don't get these kind of mistakes in our project so it's always a good idea to move the scrap data to a temporary location called containers and then store them inside the database and these temporary containers where we are storing the extracted data are called as items so we will be using this items dot pi file to create our item containers and if you look over here you can see that the class of pour tutorial item has been automatically created for us by scrappy when we created the project now we have a couple of fields inside our quotes underscore spider that is the title field the author field and the tag field in order to declare these fields inside items dot IFL simply you can write the name of the field and then you can type in scrappie dot field so we are just going to uncomment this up and instead of the name and is gonna call it title and then I'm just gonna copy and paste two more times for the author and the tags so instead of the title over here I can just write author pretty easy peasy and then instead of the title I'm just gonna write tags over here and then I'm just going to remove this pass now we have declared the fields inside a class of quote toriel I don't know we need to import this file of items dot PI file in site are quotes underscore spider dot Python file so that is pretty simple we can just write from dot dot items so this basically goes to this items dot PI files from inside this spiders folder and from then we need to import this class of code tutorial items of in just type in core tutorial item and this is going to automatically import this class for us now over here inside this parse method we have to create a new variable let's call this variable as items so this is basically an instance variable because we are going to be creating an instance of this quad tutorial item class so we can just type n items equals two and then write down the name of the class that is in our case code tutorial item and then we can add a parenthesis so if you don't know about classes and objects this is basically a class and when we need to create an instance of this class then we just write the name of the instance that we want and then we can just write the name of the class and then add parentheses now that you have created the instance we I need to store we need to use this code tutorial items blueprint to store some items inside this items instance so what we can do is we can write items over here and then instead of just yielding these dictionaries over here we can just write the name that is the title and this name that we are writing inside the square brackets is actually the field name that we have given over here so if we had put in titles over here we will actually need to write titles over here too but it's just easier to make sure that you write the name of these items as the same as the name of the variables that you have extracted so instead of calling them titles I'm just going to call it title just to make our work a little bit easier and then items title equals to title and this is the name of the variable that we have extracted over here now instead of the author and the tax just like we have done with the title we are going to do it with the author and tag so this is going to copy and paste it two more times and then instead of the titles pretty easy we're just going to write author and over here to author and if you're following along you can just write it with me and just tag over here we can also call it dag now just to make sure in items dot PI I have called this variable as tags and over here I've called this variable as tag so this is not going to work instead of tags we can just call it tag so that this remains the same now instead of yielding all of these key value and key value pairs what we can do is we can just yield one thing and that is items and this will make sure that all of our code is working properly and all of these items are being returned properly so what we are going to do is we are just gonna run this crawler once and make sure that everything is running properly and then I am just gonna go through all of this code again so that you guys properly understand what is happening let me just run this crawler once so I'm just gonna open our terminal window I'm just gonna go inside our code tutorial project and over here we are just going to right scrappy grow and then over here we are going to write the name of a project that is quotes and then press ENTER and everything should be working properly let it run and if we scroll up you will be able to see that all of these quotes are getting scrapped properly so what we have done is that inside this items dot Y file we have created these temporary container known as code tutorial item and inside that declared some fields and then we created an instance obviously first we imported our items dot PI file and specifically imported this class of code tutorial item and after we reported the class we created an instance called as items and then we use the blueprint of this class to make sure that the title the author and the tags are stored in respective proper containers and then we just yielded the items and this looks actually a lot more beautiful than what we were doing previously that is just returning or yielding the dictionary's key value pairs now if you have any confusion about this class and instance and objects or whatever I am talking about make sure you check out the video of classes and objects and inheritance also because that is going to help you understand object-oriented programming that I have added in this video series but anyways now that we have made sure that our items are in proper containers proper temporary containers now we can move on to the third part that is storing them inside our database so in the next video we are going to use a very simple kind of a database known as a JSON format so we are going to be storing all of our data inside JSON in the next video so I'll see you over there
Info
Channel: buildwithpython
Views: 39,293
Rating: 4.9675455 out of 5
Keywords: web scraping, scrapy, python, scraping, python web scraping tutorial, beginner, python projects, web scrapping, web crawler, tutorial
Id: QksUFT2Cmlo
Channel Id: undefined
Length: 6min 59sec (419 seconds)
Published: Mon Jan 21 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.