nanmicoder/mediacrawler

Analysis updated 2026-05-18

★ 48,940PythonAudience · developerComplexity · 3/5Setup · moderate

Mindmap

mindmap
  root((repo))
    What it does
      Scrape posts videos
      Fetch comments replies
      Search by keyword
      Creator page crawling
    Platforms supported
      Xiaohongshu RedNote
      Douyin TikTok
      Bilibili Weibo
      Kuaishou Zhihu
    How it works
      Browser automation
      JavaScript extraction
      Session management
      Chrome DevTools Protocol
    Export formats
      CSV JSON Excel
      SQLite MySQL
    Tech stack
      Python Playwright
      FastAPI web UI
      Node.js optional
    Use cases
      Research trends
      Sentiment analysis
      Learning scraping

mindmap root((repo)) What it does Scrape posts videos Fetch comments replies Search by keyword Creator page crawling Platforms supported Xiaohongshu RedNote Douyin TikTok Bilibili Weibo Kuaishou Zhihu How it works Browser automation JavaScript extraction Session management Chrome DevTools Protocol Export formats CSV JSON Excel SQLite MySQL Tech stack Python Playwright FastAPI web UI Node.js optional Use cases Research trends Sentiment analysis Learning scraping

Click or tap to explore — scroll the page freely

What do people build with it?

USE CASE 1

Collect public posts and videos from Chinese social media platforms for research or trend analysis.

USE CASE 2

Gather user comments and sentiment data from multiple platforms to understand audience reactions.

USE CASE 3

Learn how browser-based web scraping works by studying the Playwright automation approach.

USE CASE 4

Export social media data to CSV, JSON, or database formats for further analysis.

What is it built with?

PythonPlaywrightFastAPINode.jsChrome DevTools Protocol

How does it compare?

	nanmicoder/mediacrawler	jingyaogong/minimind	run-llama/llama_index
Stars	48,940	49,021	49,173
Language	Python	Python	Python
Setup difficulty	moderate	hard	moderate
Complexity	3/5	4/5	3/5
Audience	developer	developer	developer

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · moderate Time to first run · 30min

Requires Playwright browser installation and may need to handle anti-scraping measures or account authentication for some platforms.

License could not be detected automatically. Check the repository's LICENSE file before use.

In plain English

MediaCrawler is a Python tool for scraping publicly available content from major Chinese social media platforms. It supports collecting posts, videos, and comments from Xiaohongshu (RedNote), Douyin (Chinese TikTok), Kuaishou (a short-video app), Bilibili (a video platform), Weibo, Tieba (a Chinese forum site), and Zhihu (a Q&A platform similar to Quora). The tool can search by keyword, crawl specific post IDs, fetch comments and replies, and pull content from specific creator pages. The core technical approach relies on browser automation using Playwright, a library that controls a real web browser programmatically. Instead of manually reverse-engineering each platform's API encryption, which is complex and fragile, the tool logs into the platform through the browser, maintains the authenticated session, and then uses JavaScript within that browser context to extract the signed request parameters. This avoids the need to crack encrypted API signatures, making the tool easier to maintain. By default it connects to an already-open Chrome browser using the Chrome DevTools Protocol (CDP), which lets it reuse your existing login state and cookies and reduces the chance of the platform detecting automated activity. The README carries a clear disclaimer stating the tool is intended for learning and research only, not commercial use or large-scale scraping, and links to documented cases of illegal scraping activity in China. You would use this repository if you are a researcher studying social media trends, a data analyst gathering public sentiment data from Chinese platforms, or a developer learning how browser-based scraping works. Data can be exported to CSV, JSON, Excel, SQLite, or MySQL. The tech stack is Python (3.11 recommended) with Playwright for browser automation and Node.js as an optional dependency for JavaScript execution. A simple web UI is also included, built with a FastAPI backend.

Copy-paste prompts

Prompt 1

Show me how to set up MediaCrawler to scrape posts from Xiaohongshu by keyword and export them to CSV.

Prompt 2

How does MediaCrawler use Playwright and Chrome DevTools Protocol to avoid API signature cracking?

Prompt 3

Help me configure MediaCrawler to fetch comments and replies from a specific Douyin video ID.

Prompt 4

What are the steps to run MediaCrawler's web UI and crawl content from multiple Chinese platforms at once?

Prompt 5

Explain how MediaCrawler maintains authenticated sessions to reduce detection by social media platforms.

Frequently asked questions

What is mediacrawler?

Python tool for scraping public posts, videos, and comments from major Chinese social media platforms like TikTok, Xiaohongshu, Bilibili, and Weibo using browser automation.

What language is mediacrawler written in?

Mainly Python. The stack also includes Python, Playwright, FastAPI.

What license does mediacrawler use?

License could not be detected automatically. Check the repository's LICENSE file before use.

How hard is mediacrawler to set up?

Setup difficulty is rated moderate, with roughly 30min to a first successful run.

Who is mediacrawler for?

Mainly developer.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Scan in gitsafehub Deploy in gitdeployhub nanmicoder on gitmyhub

Verify against the repo before relying on details.