Send a one-shot prompt to GPT-4 from your shell
Pipe a file into the CLI to summarize or transform it
Stream Claude responses token by token in the terminal
Talk to local llama2 or mistral via Ollama without leaving the prompt
You still need to provide an API key for hosted models, or run Ollama locally.
This project is a small command line tool, written in Python, that lets you send prompts to large language models without leaving your terminal. You install it with pip install llm-cli, then call it as llm-cli followed by a quoted prompt. The README shows examples such as asking it to explain recursion in Python, write a haiku about debugging, generate test data, or simply answer what is 2 plus 2. The tool offers a set of flags for changing how the request is made. You can pick a model with the model flag, adjust randomness with temperature on a 0.0 to 2.0 scale, cap reply length with max tokens, stream the answer token by token, read the prompt from standard input by piping a file in, or use a multi line input mode that ends on Ctrl D. A system flag lets you set a system prompt, and the API key can be passed as a flag or read from the LLM_API_KEY environment variable. Defaults can be set in two ways. Three environment variables, LLM_API_KEY, LLM_DEFAULT_MODEL, and LLM_API_BASE, control credentials, the default model, and the base URL of the API. The same values, along with temperature and max tokens, can also live in a JSON config file at the tilde slash .llm-cli slash config.json path. The README lists three model families it claims to support: OpenAI models gpt-4 and gpt-3.5-turbo, Anthropic models claude-3-opus and claude-3-sonnet, and local models served through Ollama, namely llama2 and mistral. The project is released under the MIT license and the README says pull requests are welcome but to open an issue first for big changes.
Generated 2026-05-22 · Model: sonnet-4-6 · Verify against the repo before relying on details.