Collect public posts and videos from Chinese social media platforms for research or trend analysis.
Gather user comments and sentiment data from multiple platforms to understand audience reactions.
Learn how browser-based web scraping works by studying the Playwright automation approach.
Export social media data to CSV, JSON, or database formats for further analysis.
Requires Playwright browser installation and may need to handle anti-scraping measures or account authentication for some platforms.
MediaCrawler is a Python tool for scraping publicly available content from major Chinese social media platforms. It supports collecting posts, videos, and comments from Xiaohongshu (RedNote), Douyin (Chinese TikTok), Kuaishou (a short-video app), Bilibili (a video platform), Weibo, Tieba (a Chinese forum site), and Zhihu (a Q&A platform similar to Quora). The tool can search by keyword, crawl specific post IDs, fetch comments and replies, and pull content from specific creator pages. The core technical approach relies on browser automation using Playwright, a library that controls a real web browser programmatically. Instead of manually reverse-engineering each platform's API encryption, which is complex and fragile, the tool logs into the platform through the browser, maintains the authenticated session, and then uses JavaScript within that browser context to extract the signed request parameters. This avoids the need to crack encrypted API signatures, making the tool easier to maintain. By default it connects to an already-open Chrome browser using the Chrome DevTools Protocol (CDP), which lets it reuse your existing login state and cookies and reduces the chance of the platform detecting automated activity. The README carries a clear disclaimer stating the tool is intended for learning and research only, not commercial use or large-scale scraping, and links to documented cases of illegal scraping activity in China. You would use this repository if you are a researcher studying social media trends, a data analyst gathering public sentiment data from Chinese platforms, or a developer learning how browser-based scraping works. Data can be exported to CSV, JSON, Excel, SQLite, or MySQL. The tech stack is Python (3.11 recommended) with Playwright for browser automation and Node.js as an optional dependency for JavaScript execution. A simple web UI is also included, built with a FastAPI backend.
Generated 2026-05-18 · Model: sonnet-4-6 · Verify against the repo before relying on details.