meta-llama/codellama

★ 16,327PythonAudience · developerComplexity · 4/5Setup · hard

Mindmap

mindmap
  root((Code Llama))
    Model Variants
      Base completion
      Python specialized
      Instruct chat style
    Model Sizes
      7B and 13B
      34B and 70B
    Key Features
      Code infilling
      100K token context
      Local self-hosting
    Requirements
      CUDA GPU
      PyTorch
      Meta download access

mindmap root((Code Llama)) Model Variants Base completion Python specialized Instruct chat style Model Sizes 7B and 13B 34B and 70B Key Features Code infilling 100K token context Local self-hosting Requirements CUDA GPU PyTorch Meta download access

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Run a local AI code completion model that fills in gaps in existing code without sending data to a third-party API

USE CASE 2

Self-host a 7B coding model for fast autocomplete in a private development environment

USE CASE 3

Use Code Llama Instruct in a conversational style to ask coding questions and get detailed answers

USE CASE 4

Build a custom code review or generation pipeline using the 34B or 70B model for higher accuracy

Tech stack

PythonPyTorchCUDA

Getting it running

Difficulty · hard Time to first run · 1day+

Requires CUDA GPU with at least 12.5 GB VRAM for the smallest model and Meta download approval before you can start.

In plain English

Code Llama is a family of large language models (AI systems trained on vast amounts of text and code) released by Meta, specialized for understanding and generating code. This repository contains the Python inference code, the scripts needed to load Code Llama model weights and run them locally to get predictions. The family comes in multiple flavors: base models (Code Llama) for code completion, Python-specialized models (Code Llama - Python) tuned further on Python code, and instruction-following models (Code Llama - Instruct) that you can prompt in conversational style to ask coding questions. Each flavor is available in sizes of 7 billion, 13 billion, 34 billion, and 70 billion parameters, larger models are generally more capable but require more memory and hardware. The 7B model requires about 12.55 GB of storage, while the 70B model requires about 131 GB. A notable feature is code infilling: the 7B and 13B base and instruct models can fill in a gap in existing code based on the surrounding context, useful for autocomplete-style features. All models support input contexts of up to 100,000 tokens, meaning they can consider large amounts of existing code when generating. To use the models, you request download access via Meta's website, download the weights, and run inference locally using PyTorch with CUDA (a GPU computing framework). This is for developers who want to run Code Llama on their own infrastructure rather than calling a hosted API. The full README is longer than what was provided.

Copy-paste prompts

Prompt 1

How do I download and run Meta's Code Llama 7B model locally to autocomplete Python functions using this repository?

Prompt 2

Write a Python script using the Code Llama inference code to fill in a gap marked with a placeholder in an existing function

Prompt 3

What GPU memory do I need to run the Code Llama 34B model from this repository, and how do I start it?

Prompt 4

How do I use Code Llama Instruct to ask it to refactor a piece of code in a conversational back-and-forth style?

Prompt 5

Walk me through requesting download access from Meta and setting up the model weights for the 13B Code Llama model

Open on GitHub → Explain another repo

← meta-llama on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.