Analysis updated 2026-05-18
Reproduce the Sarvam-105B identity fragility finding by running a single curl command against the API
Run the full 38-test or 75-test suite to verify all reported vulnerabilities against a live Sarvam API key
Study pre-captured evidence JSON files to understand how LLM identity manipulation and reasoning leakage work
Reference the OWASP LLM Top 10 classifications for these findings when writing your own security reports
| flawme/sarvam-2026-001 | 0-bingwu-0/live-interpreter | 0xkaz/llm-governance-dashboard | |
|---|---|---|---|
| Stars | 2 | 2 | 2 |
| Language | Python | Python | Python |
| Setup difficulty | easy | moderate | hard |
| Complexity | 2/5 | 2/5 | 4/5 |
| Audience | researcher | general | ops devops |
Figures from each repo's GitHub metadata at analysis time.
Requires a Sarvam AI API key from api.sarvam.ai to reproduce the findings, the pip dependency is just the requests library.
This repository publishes a security assessment of Sarvam-105B, a large language model made by Sarvam AI and accessed through their API. The researcher found and documented eight weaknesses, submitted a report to Sarvam AI in May 2026, and waited 32 days for a response. When no follow-up came, the report was published publicly under standard responsible disclosure practices. The three most serious findings are specific to Sarvam's deployment. The first is identity fragility: when you call the Sarvam-105B API with a neutral system message, the model responds claiming to be Google Gemini instead of Sarvam AI. When you add a tools array to the API call, it claims to be OpenAI ChatGPT. This happens without any deliberate manipulation and affects any standard API deployment that uses system messages or function calling. The second high-severity finding is reasoning content leakage: the API's reasoning field in its responses can expose the contents of the system prompt, which developers typically treat as private. The five lower-severity findings describe prompt injection weaknesses that are common across the AI industry and not unique to this model. The repository includes the full PDF report (20 pages), pre-captured JSON evidence files showing the raw API requests and responses for each of the eight vulnerabilities, and Python test scripts so anyone with a Sarvam API key can reproduce the findings. The most basic finding can be verified with a single curl command against the public API. Vendor acknowledgment arrived the day after the report was submitted, but no follow-up came within the stated 32-day window. The researcher then published the full disclosure, including the correspondence with Sarvam AI.
A published security assessment reporting eight vulnerabilities in Sarvam AI's 105B language model API, including identity spoofing and system prompt leakage.
Mainly Python. The stack also includes Python, Sarvam AI API.
Free to read and share with attribution for non-commercial purposes, no modifications allowed and no commercial use permitted.
Setup difficulty is rated easy, with roughly 5min to a first successful run.
Mainly researcher.
This repo across BitVibe Labs
Verify against the repo before relying on details.