Archive all posts from a set of Weibo accounts, including text, timestamps, likes, and media, to a local database.
Build a dataset of Weibo posts from specific users for research or social media trend analysis.
Download all images and videos attached to Weibo posts to a local folder.
Schedule automatic daily runs to keep a Weibo user's post history continuously up to date.
Requires editing a config file with Weibo user IDs, providing a browser cookie gives access to login-gated content.
This is a Python tool for collecting posts and profile data from Weibo, which is a large social media platform in China similar in style to Twitter. Given one or more Weibo user IDs, the tool fetches everything those accounts have posted and saves the results to files on your computer. The README and most of the documentation are written in Chinese. For each user, the tool collects two categories of data. The first is profile information: the user's display name, follower and following counts, number of posts, location, verified status, and similar account details. The second is post data: the text of each post, when it was published, how many likes and comments it received, what device it was posted from, any hashtags or mentions it contains, and the original post if the item is a repost rather than new content. Beyond text, the tool can also download images and videos attached to posts. You can configure separately whether to download media from original posts and from reposts. Collected data can be written to CSV files, JSON files, or stored in a MySQL, MongoDB, or SQLite database, depending on what you configure. Setup involves editing a configuration file to specify the user IDs you want to collect, the date range, whether to include only original posts or reposts as well, and your output format preferences. Providing a browser cookie is optional but allows access to data that would otherwise require being logged in to Weibo. The tool also supports scheduled automatic runs, so you can set it up to check back every few days and download only new posts since the last run. A Docker image is available if you prefer not to install Python dependencies directly on your machine. An optional API service mode is also mentioned, though details are in the full README.
← dataabc on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.