Study how a real competition team approached ad conversion prediction to learn from their feature engineering decisions
Reproduce the team's 0.832321 competition score to understand what modeling choices drove their ranking
Adapt the time-aware history relevance scoring technique for your own recommendation or ad ranking system
README is in Chinese, requires access to the competition dataset and a configured ML training environment.
This repository contains the code that a team called "exclusive" used in a 2026 Tencent advertising machine learning competition. They finished 51st place out of all participants, with a final score of 0.832321. The code is written in Python and is open-sourced so others can see what approach they took. The core problem the team was solving is called PCVR, which stands for predicting whether a user will convert after seeing an ad. Conversion might mean clicking through or making a purchase. The team noticed that user behavior has strong time patterns: people act differently at different hours of the day and different days of the week, and the timing of past user actions also matters when interpreting a current request. They also noticed that only some of a user's browsing history is actually relevant to any given ad, and that training data from too far in the past can hurt predictions because user behavior shifts over time. To handle these observations, the team made several technical changes. They added time-based features that encode what hour and day of the week each ad impression happened. They also attached time stamps to each event in a user's history, so the model can see not just what the user did before, but when they did it. For matching ads to relevant history, they used a technique that scores each historical action based on how closely it relates to the current ad, then weights the history summary toward those relevant moments. They also filtered training data to focus on the most recent 90% of samples, so the model learns from behavior patterns closer to the actual deployment period. The remaining changes cover how user numeric features are grouped and projected before entering the model, and how very large category tables are handled with a memory-efficient hashing approach that uses a learned gate to control how much each hashed signal contributes. The repository includes scripts for training, inference, and a default shell script that reproduces the submission configuration. Most parameters can be overridden through environment variables. The README is written in Chinese, so readers who do not read Chinese will need a translation tool to follow the detailed technical notes.
← rongfeng-guo on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.