Get AI code suggestions inside VS Code without sending your code to any external server or paying for a subscription.
Run a private, offline code completion tool on a laptop with at least 4 GB of RAM using just a CPU.
Use GPU acceleration via Docker to get faster code suggestions on a machine with an NVIDIA graphics card.
Swap in a local AI backend for the vscode-fauxpilot extension as a free, self-hosted Copilot alternative.
Requires downloading a model separately from Hugging Face (several GB). Use the pre-built binary for simplest setup, or Docker for GPU support. Project is archived, no new updates.
TurboPilot was an open source project that let developers run an AI code completion tool entirely on their own computer, without sending code to any external server. It worked as a local alternative to GitHub Copilot, the popular AI coding assistant that runs in the cloud. The project is now archived and no longer maintained as of September 2023, with the author noting that more mature alternatives are available. While it was active, TurboPilot ran large language models trained on code directly on a regular CPU, needing as little as 4 gigabytes of RAM for smaller models. It supported several models, including Salesforce Codegen, WizardCoder, Starcoder, and StableCode. Models were downloaded separately from Hugging Face, a public repository for machine learning models, and then loaded by the TurboPilot server process. Users with more RAM or a capable GPU could run larger, more accurate models. The server started on a local port and presented an API compatible with the Copilot and OpenAI format. This meant it could work as a drop-in backend for VS Code extensions like vscode-fauxpilot, which would send code to the local server instead of to GitHub's servers. The result was code completion suggestions that appeared inline as you typed, all processed on your own machine. Installation came in two forms: a pre-built binary for direct execution, or a Docker container image. The Docker version also supported NVIDIA GPU acceleration via CUDA, which significantly increased the speed of generating suggestions. For GPU use, specific CUDA versions of the container image were provided. The project was written in C++ and was based on work from the llama.cpp and fauxpilot projects. It was aimed at developers who wanted private, self-hosted code completion without a paid subscription or cloud dependency.
← ravenscroftj on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.