explaingit

nanmicoder/mediacrawler

49,801PythonAudience · developerComplexity · 3/5ActiveSetup · moderate

TLDR

Python tool for scraping public posts, videos, and comments from major Chinese social media platforms like TikTok, Xiaohongshu, Bilibili, and Weibo using browser automation.

Mindmap

mindmap
  root((repo))
    What it does
      Scrape posts videos
      Fetch comments replies
      Search by keyword
      Creator page crawling
    Platforms supported
      Xiaohongshu RedNote
      Douyin TikTok
      Bilibili Weibo
      Kuaishou Zhihu
    How it works
      Browser automation
      JavaScript extraction
      Session management
      Chrome DevTools Protocol
    Export formats
      CSV JSON Excel
      SQLite MySQL
    Tech stack
      Python Playwright
      FastAPI web UI
      Node.js optional
    Use cases
      Research trends
      Sentiment analysis
      Learning scraping

Things people build with this

USE CASE 1

Collect public posts and videos from Chinese social media platforms for research or trend analysis.

USE CASE 2

Gather user comments and sentiment data from multiple platforms to understand audience reactions.

USE CASE 3

Learn how browser-based web scraping works by studying the Playwright automation approach.

USE CASE 4

Export social media data to CSV, JSON, or database formats for further analysis.

Tech stack

PythonPlaywrightFastAPINode.jsChrome DevTools Protocol

Getting it running

Difficulty · moderate Time to first run · 30min

Requires Playwright browser installation and may need to handle anti-scraping measures or account authentication for some platforms.

License could not be detected automatically. Check the repository's LICENSE file before use.

In plain English

MediaCrawler is a Python tool for scraping publicly available content from major Chinese social media platforms. It supports collecting posts, videos, and comments from Xiaohongshu (RedNote), Douyin (Chinese TikTok), Kuaishou (a short-video app), Bilibili (a video platform), Weibo, Tieba (a Chinese forum site), and Zhihu (a Q&A platform similar to Quora). The tool can search by keyword, crawl specific post IDs, fetch comments and replies, and pull content from specific creator pages. The core technical approach relies on browser automation using Playwright, a library that controls a real web browser programmatically. Instead of manually reverse-engineering each platform's API encryption, which is complex and fragile, the tool logs into the platform through the browser, maintains the authenticated session, and then uses JavaScript within that browser context to extract the signed request parameters. This avoids the need to crack encrypted API signatures, making the tool easier to maintain. By default it connects to an already-open Chrome browser using the Chrome DevTools Protocol (CDP), which lets it reuse your existing login state and cookies and reduces the chance of the platform detecting automated activity. The README carries a clear disclaimer stating the tool is intended for learning and research only, not commercial use or large-scale scraping, and links to documented cases of illegal scraping activity in China. You would use this repository if you are a researcher studying social media trends, a data analyst gathering public sentiment data from Chinese platforms, or a developer learning how browser-based scraping works. Data can be exported to CSV, JSON, Excel, SQLite, or MySQL. The tech stack is Python (3.11 recommended) with Playwright for browser automation and Node.js as an optional dependency for JavaScript execution. A simple web UI is also included, built with a FastAPI backend.

Copy-paste prompts

Prompt 1
Show me how to set up MediaCrawler to scrape posts from Xiaohongshu by keyword and export them to CSV.
Prompt 2
How does MediaCrawler use Playwright and Chrome DevTools Protocol to avoid API signature cracking?
Prompt 3
Help me configure MediaCrawler to fetch comments and replies from a specific Douyin video ID.
Prompt 4
What are the steps to run MediaCrawler's web UI and crawl content from multiple Chinese platforms at once?
Prompt 5
Explain how MediaCrawler maintains authenticated sessions to reduce detection by social media platforms.
Open on GitHub → Explain another repo

Generated 2026-05-18 · Model: sonnet-4-6 · Verify against the repo before relying on details.