psf/requests-html

★ 13,846PythonAudience · developerComplexity · 2/5Setup · moderate

Mindmap

mindmap
  root((requests-html))
    Fetching
      Auto redirects
      Cookie handling
      Browser user-agent
    Parsing
      CSS selectors
      XPath queries
      Element attributes
    Advanced
      JavaScript rendering
      Headless Chromium
      Async requests
    Install
      pip install
      Python 3 only

mindmap root((requests-html)) Fetching Auto redirects Cookie handling Browser user-agent Parsing CSS selectors XPath queries Element attributes Advanced JavaScript rendering Headless Chromium Async requests Install pip install Python 3 only

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Scrape product prices, headlines, or article text from websites using CSS selectors or XPath queries.

USE CASE 2

Fetch data from JavaScript-rendered pages by triggering headless Chromium to execute the page's scripts before parsing.

USE CASE 3

Scrape multiple URLs simultaneously using async requests instead of waiting for each page one at a time.

USE CASE 4

Maintain cookies and session state across multiple requests to scrape pages that require login.

Tech stack

PythonChromiumpip

Getting it running

Difficulty · moderate Time to first run · 30min

JavaScript rendering downloads headless Chromium on first use, which requires additional disk space and time.

No license information is mentioned in the explanation.

In plain English

Requests-HTML is a Python library for fetching web pages and pulling specific data out of them. It extends the popular requests HTTP library with the ability to parse the HTML that comes back from a web request, which is useful for scraping information from websites. The library handles several things that make web scraping tricky. It automatically follows redirects, maintains cookies between requests, pools connections for efficiency, and sends a browser-like user-agent header so servers treat the requests as though they came from a real web browser. You get these behaviors without any extra configuration. For extracting data from a page, the library supports two query styles. The first is CSS selectors, which work similarly to jQuery and let you find elements by tag name, class, ID, or combinations. The second is XPath, an older path-based query language that is more verbose but also more precise. Once you find an element, you can read its text, access its attributes, or pull sub-elements from it. One notable feature is JavaScript rendering. Many modern websites load their content dynamically via JavaScript after the initial HTML arrives. Requests-HTML can run JavaScript by launching a headless Chromium browser in the background, waiting for it to finish executing, and then parsing the resulting page. This is an optional step you call explicitly when needed. The library also supports async requests, meaning you can fetch several pages at the same time rather than waiting for each one to finish before starting the next. This speeds things up considerably when you need to scrape many URLs. Requests-HTML is part of the Python Software Foundation's GitHub organization and was created by the author of the requests library. It is available via pip and targets Python 3.

Copy-paste prompts

Prompt 1

Using requests-html, scrape all article headlines and links from a news website using CSS selectors and save the results to a JSON file.

Prompt 2

I need to scrape data from a React-rendered product page. Show me how to use requests-html's render() method to execute JavaScript before parsing.

Prompt 3

Write an async requests-html script that fetches 50 product pages at the same time and extracts the title and price from each.

Prompt 4

Use requests-html with XPath queries to extract structured data from a Wikipedia article and turn it into a Python dictionary.

Open on GitHub → Explain another repo

← psf on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.