Replicate the CVPR 2026 AwareVLN results on the R2R and RxR navigation benchmarks inside a simulated 3D building environment.
Use the automatic reasoning annotation pipeline to generate decision-point labels for your own navigation dataset without manual annotation.
Download the pre-trained AwareVLN weights and fine-tune them on a custom set of navigation instructions.
Study how a single model switches between a reasoning mode and an action mode to correct its own course during navigation.
Requires building several older software libraries from source and downloading gigabytes of 3D scene data before evaluation can run.
AwareVLN is a research project accepted at CVPR 2026, a major computer vision conference, that focuses on teaching AI agents to navigate through physical spaces by following spoken or written directions. The core challenge is getting an AI to correctly interpret an instruction like "go to the counter and turn right at the lamp" while actually moving step by step through a simulated building environment. The project's central contribution is adding what the researchers call self-aware reasoning to the navigation process. Rather than having the AI act immediately at every step, the system can pause at key decision points, think through what it is seeing versus what the instruction expects, and then decide whether to continue or correct course. A single AI model handles both the thinking and the acting, switching between a reasoning mode and an action mode depending on the situation. To train the model, the team built an automatic labeling process that generates reasoning annotations for existing navigation datasets, avoiding the need for extensive human labeling of every decision point. The model starts from an existing pretrained navigation system called NaVILA and is further trained on these automatically labeled examples. Pre-trained weights and the labeled dataset are available for download through links in the README. Evaluation uses two standard research benchmarks called R2R and RxR, both run inside a simulated 3D building environment. Running the evaluation requires building several older software libraries from source and downloading several gigabytes of 3D scene data, so this is primarily a setup for other researchers in the field rather than a general-purpose tool. The code, model weights, and dataset are all publicly available. This is academic research software, useful for anyone studying AI-driven navigation systems, but not a ready-to-use product.
← gwxuan on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.