Stand up a self-hosted OpenAI-compatible chat API on three AWS VMs with one terraform apply.
Study a worked example of public/private subnet separation with a bastion-only SSH path.
Reuse the GitHub Actions selective Docker build pattern for any multi-image monorepo.
Swap Qwen3-0.6B for another small model and benchmark the same network shape.
Needs AWS credentials with EC2/VPC/NAT permissions and an admin IP allowlist for SSH; full apply takes 3-5 minutes plus user-data bootstrap.
This repository is a DevOps assignment that sets up a small language model inference service spread across three Amazon EC2 virtual machines in the Mumbai AWS region. The model is Qwen3-0.6B, a compact open weights chat model. Requests come in through a JSON HTTP API that mimics the OpenAI chat completions format, so existing clients can talk to it with minor changes. The three machines play different roles. VM1 sits in a public subnet and runs Nginx as a reverse proxy plus the iii engine, a Rust binary that orchestrates the workers. VM2 sits in a private subnet and runs a TypeScript caller worker that translates HTTP requests into internal RPC calls. VM3, also private, runs a Python inference worker that loads the model and produces answers. Everything is wrapped in a dedicated virtual private cloud called iii-vpc. Network security follows a standard pattern. VM1 is the only machine reachable from the internet, on port 80 for HTTP and port 22 for SSH from the admin IP only. The two workers have no public IP and can only be reached from inside the VPC. They use a NAT gateway in the public subnet for outbound traffic, so they can pull docker images and packages without ever accepting inbound connections from the internet. SSH to the workers must go through VM1 acting as a bastion host. The whole stack is provisioned with Terraform. A single terraform apply creates the VPC, the public and private subnets, the internet and NAT gateways, two security groups, and the three EC2 instances. After three to five minutes the user data scripts finish, and a curl to the public IP confirms the service is up. A terraform destroy tears it all down. All three services ship as Docker images on Docker Hub. A GitHub Actions workflow rebuilds them selectively: only the images whose source files changed in a push to main are rebuilt and pushed, using Docker Buildx with the GitHub Actions layer cache for speed.
Generated 2026-05-22 · Model: sonnet-4-6 · Verify against the repo before relying on details.