explaingit

jjihwan/liteframe

Analysis updated 2026-06-24

14Audience · researcherComplexity · 1/5Setup · hard

TLDR

Placeholder repo for the LiteFrame paper, a vision encoder that helps Video LLMs scale to many frames. Code and weights are not released yet.

Mindmap

mindmap
  root((LiteFrame))
    Inputs
      Long videos
      Many frames
    Outputs
      Paper PDF
      Project page
      BibTeX
    Use Cases
      Cite the paper
      Watch for code drop
    Tech Stack
      Vision Transformer
      Video LLM
    Status
      Code coming soon
      Weights coming soon
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

What do people build with it?

USE CASE 1

Cite the LiteFrame paper in your own video understanding research

USE CASE 2

Watch the repo for the upcoming code and weight release

USE CASE 3

Read the arXiv preprint to learn how to scale frame counts in Video LLMs

What is it built with?

ViTVideo LLM

How does it compare?

jjihwan/liteframe0c33/agentic-ai0xbebis/hyperpay
Stars141414
LanguagePythonTypeScript
Setup difficultyhardhardhard
Complexity1/54/55/5
Audienceresearcherdeveloperdeveloper

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · hard Time to first run · 1day+

No runnable code exists yet, the repo only hosts the paper and a release-pending note.

In plain English

LiteFrame is the official GitHub repository for a research paper titled LiteFrame: Efficient Vision Encoders Unlock Frame Scaling in Video LLMs. The work comes from a team at Google DeepMind together with Seoul National University, with Jihwan Kim listed as the first author and other authors including Nikhil Parthasarathy, Danfeng Qin, Junhwa Hur, Deqing Sun, Bohyung Han, Ming-Hsuan Yang, and Boqing Gong. The README's own one-sentence summary calls the project a highly efficient video encoder for Video Large Language Models that aims to unlock scalable, long-form video understanding by addressing inefficiencies in both the language model and the Vision Transformer (ViT). In other words, the paper is about making it cheaper and more practical to feed many frames of a video into a model that combines vision and language, rather than only being able to look at a handful of frames. It is important to be plain about the current state of the repository. The README contains a clearly marked note that the code and model weights will be released soon. As of the README's news entry dated 2026.05.18, only the paper itself has been posted to arXiv. There is a 1-minute overview video linked in the README and a project page hosted on the first author's site, but no runnable training or inference code is present yet. Because the README is short and almost entirely about author credits, paper links, and the planned release, there is no install guide, no usage example, no benchmark numbers, and no description of the LiteFrame architecture itself in the text shown here. The repository acts as a placeholder that lets people cite the paper and watch for the upcoming code drop. A BibTeX citation block is included for researchers who want to reference the work in their own papers. The arXiv preprint number is 2605.17260, and the project page is at jjihwan.github.io/projects/LiteFrame. Anyone interested in actually running LiteFrame will need to wait for the authors to publish the code and weights.

Copy-paste prompts

Prompt 1
Summarise the LiteFrame paper's main idea for efficient video encoding in plain English
Prompt 2
Compare LiteFrame's claimed approach to existing Video LLM frame sampling tricks
Prompt 3
Draft a checklist of what I should test once LiteFrame's code and weights are released
Prompt 4
Explain why feeding many frames into a Video LLM is expensive and how LiteFrame plans to fix it

Frequently asked questions

What is liteframe?

Placeholder repo for the LiteFrame paper, a vision encoder that helps Video LLMs scale to many frames. Code and weights are not released yet.

How hard is liteframe to set up?

Setup difficulty is rated hard, with roughly 1day+ to a first successful run.

Who is liteframe for?

Mainly researcher.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Verify against the repo before relying on details.