flawme/sarvam-2026-001

Analysis updated 2026-05-18

★ 2PythonAudience · researcherComplexity · 2/5LicenseSetup · easy

Mindmap

mindmap
  root((SARVAM-2026-001))
    Key Findings
      Identity fragility
      Reasoning leakage
      Prompt injection
    Severity Tiers
      3 high Sarvam-specific
      5 low industry-wide
    Evidence
      38-test suite
      75-test deep analysis
      JSON captures
    Disclosure
      Submitted May 2026
      32-day window
      No vendor follow-up

mindmap root((SARVAM-2026-001)) Key Findings Identity fragility Reasoning leakage Prompt injection Severity Tiers 3 high Sarvam-specific 5 low industry-wide Evidence 38-test suite 75-test deep analysis JSON captures Disclosure Submitted May 2026 32-day window No vendor follow-up

Click or tap to explore — scroll the page freely

What do people build with it?

USE CASE 1

Reproduce the Sarvam-105B identity fragility finding by running a single curl command against the API

USE CASE 2

Run the full 38-test or 75-test suite to verify all reported vulnerabilities against a live Sarvam API key

USE CASE 3

Study pre-captured evidence JSON files to understand how LLM identity manipulation and reasoning leakage work

USE CASE 4

Reference the OWASP LLM Top 10 classifications for these findings when writing your own security reports

What is it built with?

PythonSarvam AI API

How does it compare?

	flawme/sarvam-2026-001	0-bingwu-0/live-interpreter	0xkaz/llm-governance-dashboard
Stars	2	2	2
Language	Python	Python	Python
Setup difficulty	easy	moderate	hard
Complexity	2/5	2/5	4/5
Audience	researcher	general	ops devops

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · easy Time to first run · 5min

Requires a Sarvam AI API key from api.sarvam.ai to reproduce the findings, the pip dependency is just the requests library.

Free to read and share with attribution for non-commercial purposes, no modifications allowed and no commercial use permitted.

In plain English

This repository publishes a security assessment of Sarvam-105B, a large language model made by Sarvam AI and accessed through their API. The researcher found and documented eight weaknesses, submitted a report to Sarvam AI in May 2026, and waited 32 days for a response. When no follow-up came, the report was published publicly under standard responsible disclosure practices. The three most serious findings are specific to Sarvam's deployment. The first is identity fragility: when you call the Sarvam-105B API with a neutral system message, the model responds claiming to be Google Gemini instead of Sarvam AI. When you add a tools array to the API call, it claims to be OpenAI ChatGPT. This happens without any deliberate manipulation and affects any standard API deployment that uses system messages or function calling. The second high-severity finding is reasoning content leakage: the API's reasoning field in its responses can expose the contents of the system prompt, which developers typically treat as private. The five lower-severity findings describe prompt injection weaknesses that are common across the AI industry and not unique to this model. The repository includes the full PDF report (20 pages), pre-captured JSON evidence files showing the raw API requests and responses for each of the eight vulnerabilities, and Python test scripts so anyone with a Sarvam API key can reproduce the findings. The most basic finding can be verified with a single curl command against the public API. Vendor acknowledgment arrived the day after the report was submitted, but no follow-up came within the stated 32-day window. The researcher then published the full disclosure, including the correspondence with Sarvam AI.

Copy-paste prompts

Prompt 1

Help me understand how the Sarvam-105B identity fragility vulnerability works and why adding a tools array causes the model to claim it is ChatGPT.

Prompt 2

Write a Python script using the requests library to call api.sarvam.ai with a neutral system message and log whether the model identifies itself correctly.

Prompt 3

Explain what reasoning content leakage means in LLM APIs and what an attacker could learn from an exposed reasoning_content field.

Prompt 4

How does responsible disclosure work for AI model vulnerabilities? Walk me through the SARVAM-2026-001 timeline as an example.

Frequently asked questions

What is sarvam-2026-001?

A published security assessment reporting eight vulnerabilities in Sarvam AI's 105B language model API, including identity spoofing and system prompt leakage.

What language is sarvam-2026-001 written in?

Mainly Python. The stack also includes Python, Sarvam AI API.

What license does sarvam-2026-001 use?

Free to read and share with attribution for non-commercial purposes, no modifications allowed and no commercial use permitted.

How hard is sarvam-2026-001 to set up?

Setup difficulty is rated easy, with roughly 5min to a first successful run.

Who is sarvam-2026-001 for?

Mainly researcher.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Scan in gitsafehub Deploy in gitdeployhub flawme on gitmyhub

Verify against the repo before relying on details.