ucsc-vlaa/clinseekagent

★ 13PythonAudience · researcherComplexity · 5/5Setup · hard

Mindmap

mindmap
  root((ClinSeekAgent))
    What it does
      Clinical AI reasoning
      Active evidence seeking
      Curated vs raw comparison
    Agent Tools
      EHR query tool
      Browser search tool
      Chest X-ray analyzer
    Models Tested
      Claude
      Qwen
      ClinSeek-35B-A3B
    Use Cases
      Clinical benchmarking
      Student model training
      Multimodal reasoning
    Setup
      GPU required
      MIMIC data needed
      Four separate servers

mindmap root((ClinSeekAgent)) What it does Clinical AI reasoning Active evidence seeking Curated vs raw comparison Agent Tools EHR query tool Browser search tool Chest X-ray analyzer Models Tested Claude Qwen ClinSeek-35B-A3B Use Cases Clinical benchmarking Student model training Multimodal reasoning Setup GPU required MIMIC data needed Four separate servers

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Evaluate whether an AI model performs better when it actively queries raw patient records versus reading a pre-selected summary for a clinical question.

USE CASE 2

Analyze chest X-ray images as part of an AI-driven clinical reasoning pipeline that calls imaging tools on demand.

USE CASE 3

Train a smaller student model on clinical reasoning trajectories generated by the larger agent system.

USE CASE 4

Benchmark clinical AI models across text-only EHR tasks and multimodal medical imaging tasks.

Tech stack

PythonClaudeQwen

Getting it running

Difficulty · hard Time to first run · 1day+

Requires credentialed access to the MIMIC patient dataset and GPU hardware for the image analysis and training components.

No license information is mentioned in the explanation.

In plain English

ClinSeekAgent is a research system from UC Santa Cruz that tests whether AI models can reason about clinical cases better when they actively search for evidence on their own, compared to being handed pre-selected information. The core question it addresses: does giving an AI model access to raw patient records, medical image tools, and external knowledge sources help it make better clinical decisions than reading a curated summary? The system works by placing a host AI model inside an agent loop. The paper evaluates models including Claude, Qwen, and others. The agent has access to three types of tools: one for querying patient-level electronic health record tables, one for searching external medical knowledge via a browser, and one for analyzing chest X-ray images. The model can call these tools in sequence to gather evidence before producing an answer, much like a clinician consulting different sources before making a diagnosis. Results from the paper show that active evidence-seeking often outperforms the curated baseline. On text-only EHR tasks, most evaluated models improve when given raw access instead of pre-selected snippets. The gap is larger for multimodal tasks involving imaging: one model gained over 34 percentage points on a specific reasoning category when allowed to actively query images rather than receiving them pre-processed. The repository also includes a recipe for training a smaller student model on the trajectories generated by the larger agent system. The resulting model (ClinSeek-35B-A3B) reaches performance close to its teacher on an external benchmark while being significantly smaller than the largest closed-source models it was compared against. The codebase is split into four separate roles (agent driver, EHR server, image server, training) each with its own dependencies, because the image and training components require GPU hardware and specific library versions. Patient data is not included in the repository, it must be obtained separately from credentialed sources such as the MIMIC dataset.

Copy-paste prompts

Prompt 1

Set up the ClinSeekAgent EHR server using MIMIC patient data and run an experiment comparing active-query mode versus the curated-baseline mode on a set of clinical questions.

Prompt 2

Run the training recipe in this repo to distill ClinSeek-35B-A3B from the full agent system's trajectories and evaluate it on the external benchmark.

Prompt 3

Configure the three-tool agent loop with Claude as the host model, point it at a local EHR server, and test its chest X-ray reasoning on a sample case.

Open on GitHub → Explain another repo

← ucsc-vlaa on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.