explaingit

rongfeng-guo/rank51-2026taac-kddcup

15PythonAudience · researcherComplexity · 4/5Setup · hard

TLDR

Competition code from a team that finished 51st in a 2026 Tencent ad machine learning contest, predicting whether users will convert after seeing an ad using time-aware models and relevance-scored browsing history.

Mindmap

mindmap
  root((rank51-kddcup))
    Problem
      Ad conversion prediction
      User behavior patterns
      Time-aware signals
    Techniques
      Time-based features
      History relevance scoring
      Recent data filtering
      Memory-efficient hashing
    Workflow
      Training script
      Inference script
      Shell submission config
    Configuration
      Environment variables
      Parameter overrides
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Study how a real competition team approached ad conversion prediction to learn from their feature engineering decisions

USE CASE 2

Reproduce the team's 0.832321 competition score to understand what modeling choices drove their ranking

USE CASE 3

Adapt the time-aware history relevance scoring technique for your own recommendation or ad ranking system

Tech stack

Python

Getting it running

Difficulty · hard Time to first run · 1day+

README is in Chinese, requires access to the competition dataset and a configured ML training environment.

In plain English

This repository contains the code that a team called "exclusive" used in a 2026 Tencent advertising machine learning competition. They finished 51st place out of all participants, with a final score of 0.832321. The code is written in Python and is open-sourced so others can see what approach they took. The core problem the team was solving is called PCVR, which stands for predicting whether a user will convert after seeing an ad. Conversion might mean clicking through or making a purchase. The team noticed that user behavior has strong time patterns: people act differently at different hours of the day and different days of the week, and the timing of past user actions also matters when interpreting a current request. They also noticed that only some of a user's browsing history is actually relevant to any given ad, and that training data from too far in the past can hurt predictions because user behavior shifts over time. To handle these observations, the team made several technical changes. They added time-based features that encode what hour and day of the week each ad impression happened. They also attached time stamps to each event in a user's history, so the model can see not just what the user did before, but when they did it. For matching ads to relevant history, they used a technique that scores each historical action based on how closely it relates to the current ad, then weights the history summary toward those relevant moments. They also filtered training data to focus on the most recent 90% of samples, so the model learns from behavior patterns closer to the actual deployment period. The remaining changes cover how user numeric features are grouped and projected before entering the model, and how very large category tables are handled with a memory-efficient hashing approach that uses a learned gate to control how much each hashed signal contributes. The repository includes scripts for training, inference, and a default shell script that reproduces the submission configuration. Most parameters can be overridden through environment variables. The README is written in Chinese, so readers who do not read Chinese will need a translation tool to follow the detailed technical notes.

Copy-paste prompts

Prompt 1
Walk me through how rank51-2026taac-kddcup encodes time-of-day and day-of-week features for ad impressions and why those signals help predict conversion.
Prompt 2
Explain the history relevance scoring technique in this PCVR competition repo and how it weights a user's past actions toward the current ad.
Prompt 3
I want to run the training script from rank51-2026taac-kddcup. What environment variables do I need and what does the default shell script do?
Prompt 4
How does rank51-2026taac-kddcup handle very large category tables with memory-efficient hashing and what is the learned gate doing?
Open on GitHub → Explain another repo

← rongfeng-guo on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.