Python Web Scraping - Should I use Selenium, Beautiful Soup or Scrapy? [2020]

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] hey everyone its Kaelyn from kite the AI autocomplete plugin for Python today we're gonna do an overview of web scraping with Python web scraping is super useful and has many different applications it could be as simple as extracting pricing information for a product database or as often the case with machine learning we'll need a large training data set for a model so we'll want to collect this data by scraping sources on the web in both of these examples web scraping plays a critical component in completing the project so it's a skill will want to have mastered in this video you'll learn the pros and cons of the three Python web scraping frameworks you should know selenium beautifulsoup and scraping I'll also give you scenarios of when a certain framework is better or more effective than the others so let's jump in beautiful soup is the first framework we will be going over it is overall very well rounded the first positive is it's well known for being user friendly this framework is simple to setup beautiful soup can extract HTML and xml elements from a web page with just a few lines of code let's take a look at an example script this is a simple beautiful soup script to extract the text between HTML tags this block of HTML code contains a div element with the string some text inside of its tags with beautifulsoup we can parse the HTML by calling the beautiful soup constructor with our desired parsing method and then we can extract the text with parsed underscore HTML dot fine dot text to find the context of the div tag where class is class name as evident from this code beautiful soup is easy to quickly learn and then master this is a great time to mention that kites AI code assistant can help you learn web scraping even faster kite plugs into your favorite code editor to provide ml powered code completions and it also comes with a desktop app that automatically looks up the docs for any given object your cursor is located on it's free to use and the download link is in the description beautiful soup does have its drawbacks however for one beautiful soup requires dependencies that's why I had to import BS 4 instead of beautiful soup package dependencies can make it complicate to transfer code from other projects and machines lastly it is important to note that beautifulsoup is a slow web scraper so for large datasets there would be noticeable bottlenecks and slowdowns selenium was developed to facilitate automated web testing but it's found in off-label use as a web scraper this makes selenium especially versatile since it can be used for automated testing and web scraping at the same time for example if the web scraping that you need done is in a web application that will also need to be automatically tested then selenium would be a great choice since you are able to get both of these functionalities with a single framework another Pro is selenium is good at working with JavaScript and the Dom or document object model while beautifulsoup is mainly used for HTML and xml parsing selenium can scrape information regarding the JavaScript used on a webpage this can be extremely useful as many web sites especially e-commerce websites now use javascript to dynamically load their content you can tell this by inspecting a website and looking for script tags if you see those the site's likely using javascript to load some of the content one of the cons of selenium is the same as one of its pros is not designed to be a web scraping framework this means it's not as user friendly making it a steeper learning curve to climb than beautifulsoup lastly selenium is also slow when we want to scrape large amounts of data and on that note let's talk about scraping unlike beautifulsoup and selenium scrapey is known for being very fast and efficient this is because scrapey is written with twisted a popular event-driven networking framework for python and that gives scrapey some asynchronous capabilities for instance scrapey doesn't have to wait for a response when handling multiple requests so it runs faster right out of the box per say speed is a major reason to use scrapey another Pro is portability it was written in Python and unlike beautifulsoup requires no dependencies scrapey eases the headache of making sharra web scraper works on all operating systems there's one big con with scraping and that's its user friendliness while there's plenty of documentation online help you out scrapy requires some prerequisite knowledge and a lot of setup if you haven't used scraping before make sure to check out our video tutorial and scrapey we walk you through scraping the cat subreddit as Quick Start Guide to scrapey the link is in the description below so let's summarize the top three Python frameworks for web scraping and when you should or should not use them if you have a small project or want to get a quick test up without any troubles then beautifulsoup is probably the best option to go with it's just so easy to get started selenium is the clear choice for any web scraping projects that will interact with automated testing or if you need to scrape a page that's using javascript to load its contents and if you're doing a large amount of web scraping and expect to generate a lot of data then scrapey should be your choice thanks to its built-in efficiency thanks for watching this video on the top web scrapers in python the best way to learn is to implement each of these frameworks in a project make sure to subscribe to our channel as will have more web scraping content coming your way from tutorials to more in-depth projects featuring beautiful soup selenium and scraping finally don't forget to check out the kyai autocomplete plugin links in the description below
Info
Channel: Kite
Views: 121,550
Rating: 4.924993 out of 5
Keywords: python selenium tutorial, python selenium, selenium webdriver python, python web scraping, web scraping with python, web scraping python, python scrapy, data scraping with python, scrapy python, web scraping python beautifulsoup, python web scraping tutorial, web scraping in python, python web scraping beautifulsoup, python scrapy tutorial, python web scraper, python webscraping, web scraping python tutorial, scrapy python tutorial, scrapy tutorial python 3
Id: zucvHSQsKHA
Channel Id: undefined
Length: 5min 28sec (328 seconds)
Published: Sat Feb 29 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.