Use as reference code to understand how a specific Chinese site's login flow works before writing your own scraper.
Learn how to handle CAPTCHA images, RSA-encrypted passwords, and custom session cookies in Python web scraping.
Build a data collection script for a Chinese website that requires a logged-in session to access any content.
Project is unmaintained since 2016, some login flows may be broken as target sites have updated their authentication systems.
This Python project provides scripts that automate the login process for a collection of popular Chinese websites. The purpose is to help developers who want to scrape or collect data from sites that require a logged-in session before any content becomes accessible. Logging in programmatically is often the first and most tedious step in that kind of work. The repository includes login implementations for around 20 sites, including Zhihu (a Q&A platform similar to Quora), Weibo (a microblogging service), Baidu, Douban, Bilibili, JD.com (an e-commerce site), and several others. Each implementation handles the specific authentication flow that site uses, which can involve encrypted passwords, CAPTCHA images, or custom session cookies. The project relies on three Python libraries: requests for sending web requests, Pillow for processing CAPTCHA images, and rsa for handling encryption. The README is written in Chinese and includes a note at the top stating that the project is no longer maintained. It was started in 2016, and some of the login flows may have stopped working as the target websites updated their authentication systems. The author acknowledged this and noted that only a few representative cases would be kept current going forward. Contributions were accepted while the project was active, with guidelines asking for compatibility with both Python 2 and Python 3. Video tutorials were also produced and published on Chinese video platforms for beginners learning about web scraping and login automation.
← xchaoinfo on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.