Analysis updated 2026-05-18
Run the demo dashboard to replay 21 walk-forward research generations and watch trading signals get born, thrive, and pruned across 11 years of data.
Use the live Propose button to have an AI author and backtest a brand-new trading signal in real time against historical stock data.
Study how to build a self-improving AI research loop where improvement is measured by a market-graded backtest, not by an LLM self-scoring.
Experiment with memory ablation to understand how vector-search memory affects the quality of AI-proposed quantitative signals.
| shjavokhir/quant-alpha | a-bissell/unleash-lite | abhiinnovates/whatsapp-hr-assistant | |
|---|---|---|---|
| Stars | 1 | 1 | 1 |
| Language | Python | Python | Python |
| Setup difficulty | hard | hard | hard |
| Complexity | 4/5 | 4/5 | 3/5 |
| Audience | researcher | researcher | developer |
Figures from each repo's GitHub metadata at analysis time.
Requires API keys for Google Gemini, MongoDB Atlas, Voyage AI, and DigitalOcean Gradient, the offline demo dashboard works without these.
DARWIN is a Python-based research system that automatically invents quantitative trading signals (called alphas), tests each one against historical stock market data, keeps the ones that work, and uses memory of past results to get progressively better at proposing new signals. The core claim is that the improvement is measured by a deterministic backtest, not by an AI judging itself. The system runs in generations. Each generation, it evaluates every signal currently in its library against a trailing window of data using metrics like information coefficient, turnover, and transaction costs. Weak or redundant signals get pruned. A language model then proposes new candidate signals based on a memory store of every past success and failure. These candidates are tested in a sandboxed backtest on data the proposer has not seen, and only the ones that pass get added to the library. The whole fleet is then scored on the next out-of-sample block. The memory layer is the key piece. In a controlled experiment across 600 US stocks from 2013 to 2024, the system with memory enabled discovered 68 keeper signals with positive out-of-sample quality, compared to 31 without memory and 21 from random formula search, which averaged negative quality. Memory more than doubled the number of useful signals found. The tech stack combines Google Gemini 2.5 Flash as the primary alpha proposer, MongoDB Atlas with Voyage AI embeddings for the memory and similarity search layer, a managed agent called Antigravity that spins up an isolated cloud environment to browse research literature and write code for new signals, and MiniMax M2.5 served on DigitalOcean Gradient as an alternative reasoning model. The demo dashboard runs locally with one shell script and replays a committed research run offline. A live mode is also available for proposing or researching new signals in real time. The README is upfront about limitations: net of realistic trading costs the book does not make money, and the 2024 holdout was not exceptional. The contribution is the improving research loop itself, not a profitable trading strategy.
An AI research agent that invents quantitative trading signals, prunes the weak ones, and accumulates memory to propose better signals over time, graded by a deterministic backtest that cannot be fooled.
Mainly Python. The stack also includes Python, Google Gemini, MongoDB Atlas.
Setup difficulty is rated hard, with roughly 1h+ to a first successful run.
Mainly researcher.
This repo across BitVibe Labs
Verify against the repo before relying on details.