explaingit

faizanfirdousi/alchemyst-assign

1HCLAudience · ops devopsComplexity · 4/5ActiveSetup · hard

TLDR

Terraform setup for a three-EC2 LLM inference service on AWS Mumbai. Nginx and a Rust orchestrator sit in public; a TypeScript caller and a Python Qwen3-0.6B worker run in private subnets.

Mindmap

mindmap
  root((alchemyst-assign))
    Inputs
      OpenAI-style JSON requests
      Terraform variables
    Outputs
      Chat completions
      Three EC2 VMs
      VPC and subnets
    Use Cases
      DevOps assignment
      Self-hosted LLM API
      Bastion network demo
    Tech Stack
      Terraform
      AWS
      Docker
      Rust
      TypeScript
      Python
      Qwen3

Things people build with this

USE CASE 1

Stand up a self-hosted OpenAI-compatible chat API on three AWS VMs with one terraform apply.

USE CASE 2

Study a worked example of public/private subnet separation with a bastion-only SSH path.

USE CASE 3

Reuse the GitHub Actions selective Docker build pattern for any multi-image monorepo.

USE CASE 4

Swap Qwen3-0.6B for another small model and benchmark the same network shape.

Tech stack

TerraformAWSDockerRustTypeScriptPythonNginx

Getting it running

Difficulty · hard Time to first run · 1h+

Needs AWS credentials with EC2/VPC/NAT permissions and an admin IP allowlist for SSH; full apply takes 3-5 minutes plus user-data bootstrap.

In plain English

This repository is a DevOps assignment that sets up a small language model inference service spread across three Amazon EC2 virtual machines in the Mumbai AWS region. The model is Qwen3-0.6B, a compact open weights chat model. Requests come in through a JSON HTTP API that mimics the OpenAI chat completions format, so existing clients can talk to it with minor changes. The three machines play different roles. VM1 sits in a public subnet and runs Nginx as a reverse proxy plus the iii engine, a Rust binary that orchestrates the workers. VM2 sits in a private subnet and runs a TypeScript caller worker that translates HTTP requests into internal RPC calls. VM3, also private, runs a Python inference worker that loads the model and produces answers. Everything is wrapped in a dedicated virtual private cloud called iii-vpc. Network security follows a standard pattern. VM1 is the only machine reachable from the internet, on port 80 for HTTP and port 22 for SSH from the admin IP only. The two workers have no public IP and can only be reached from inside the VPC. They use a NAT gateway in the public subnet for outbound traffic, so they can pull docker images and packages without ever accepting inbound connections from the internet. SSH to the workers must go through VM1 acting as a bastion host. The whole stack is provisioned with Terraform. A single terraform apply creates the VPC, the public and private subnets, the internet and NAT gateways, two security groups, and the three EC2 instances. After three to five minutes the user data scripts finish, and a curl to the public IP confirms the service is up. A terraform destroy tears it all down. All three services ship as Docker images on Docker Hub. A GitHub Actions workflow rebuilds them selectively: only the images whose source files changed in a push to main are rebuilt and pushed, using Docker Buildx with the GitHub Actions layer cache for speed.

Copy-paste prompts

Prompt 1
Run terraform apply on alchemyst-assign in ap-south-1, wait for user data to finish, and confirm the chat endpoint with a curl.
Prompt 2
Walk me through how a single chat request flows from Nginx through the Rust iii engine, the TypeScript caller, and the Python worker.
Prompt 3
Replace Qwen3-0.6B in the Python worker with Llama-3.2-1B-Instruct and update the inference code accordingly.
Prompt 4
Add a fourth EC2 worker in private subnet for embedding requests and route /embeddings to it from the Rust engine.
Prompt 5
Audit the security groups and SSH bastion config in alchemyst-assign and suggest hardening changes.
Open on GitHub → Explain another repo

Generated 2026-05-22 · Model: sonnet-4-6 · Verify against the repo before relying on details.