Run an LLM design proposal through a security and cost critique before writing code
Drop TrustEngine into an existing app to audit AI-generated output with a pass review reject verdict
Pair OpenAI and Anthropic models so each one critiques the other's plan
Produce a SHA-256 evidence chain showing which model said what during an AI task
Needs Python 3.9+ and at least one LLM API key; cross-provider pairing is recommended but adds a second key and billing setup.
AI Flow Architect is a Python framework for running an AI task through more than one model on purpose, so the models can check each other's work. The tagline in the README is "AI proposes. You decide." The author opens by saying that a single language model has no way to catch its own blind spots, and gives the example of asking GPT-4 to design a login system and getting back code that uses MD5 for password hashing with no rate limiting. The project's response to that is to wire two independent AI brains, ideally from different providers, into a fixed pipeline that you approve at each step. The full framework, called FlowArchitect, runs a task in stages. Brain 1, the planner, takes your request and produces a step-by-step blueprint with risk notes. An "opponent brain" then attacks that blueprint from five different angles: security audit, cost, user empathy, data rigor, and minimalism. You review and approve the plan, after which an "expert team" of session-isolated agents (creative, evaluator, programmer, reviewer) carries out the work. Finally Brain 2, the arbiter, run on a different model, compares the finished output against the original blueprint line by line and produces a quality report. There is also a standalone piece called TrustEngine you can drop into an existing project without the rest of the framework. You call engine.audit with the original requirement, the AI-generated output, and a context object describing the project. You get back a verdict (pass, review, or reject), a confidence score, findings with severity, risk points, an "uncertainty" section where the engine states what it does not know, optional votes from several arbiters, and a SHA-256 hashed evidence chain. The workflow is fixed rather than free-form, which the author treats as a feature: every task follows the same gated pipeline. One API key is enough to start because Brain 2 will auto-pick a cheaper model from the same provider, but the README says cross-provider pairing, for example OpenAI plus Anthropic, gives the strongest checks because the two models have different training data and failure modes. OpenAI (gpt-4o family, gpt-4-turbo, gpt-3.5-turbo) and Anthropic (Claude 3.5 Sonnet, 3.5 Haiku, 3 Opus) are marked production-tested. DashScope, Zhipu GLM, Moonshot, DeepSeek, and local Ollama models are listed as community-ready through OpenAI-compatible protocol but needing user verification. The project is alpha, Python 3.9 or newer, Apache 2.0 licensed, with 177 tests passing.
Generated 2026-05-22 · Model: sonnet-4-6 · Verify against the repo before relying on details.