Learn how to extract product prices from an e-commerce website using Python step by step.
Build a news headline collector that automatically pulls articles from multiple sites.
Practice scraping JavaScript-rendered pages that don't work with basic HTML parsing.
Requires installing Python and libraries like BeautifulSoup and Scrapy, the companion book provides the full explanations.
This repository contains the code samples that accompany the book "Web Scraping with Python, 2nd Edition", published by O'Reilly. The book teaches readers how to write Python programs that automatically collect information from websites, a technique called web scraping. Web scraping is the practice of writing code that visits a web page, reads its content, and extracts specific pieces of information, such as product prices, news headlines, or data tables. Instead of copying and pasting information by hand, a scraping program can do the same thing automatically, at scale, across many pages. The code in this repository is organized into Jupyter notebooks. Jupyter is a tool that lets you run Python code in a browser-based document alongside text explanations and output. Each notebook corresponds to a chapter or concept from the book. The author recommends cloning the repository and running the notebooks locally rather than reading them directly on GitHub, because some formatting may not display correctly in the browser. The repository also includes a separate folder with code from the first edition of the book, for readers working from the older version. Because websites change their structure over time and Python libraries receive updates, some code samples may become outdated after publication. The author acknowledges this and invites readers to submit corrections through GitHub pull requests. The README for this repository is brief and points to the book itself for full context. The repository is primarily a companion resource rather than a standalone project.
← remitchell on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.