Anywhere3D-Bench involves multi-level 3D visual grounding (part, object, area, space) with distinct expression types.
In the Annotation UI, you can explore up to 40 data items by clicking the Load Current ID Annotations button at the bottom of the page. For a detailed guide, use the Tutorial button in the top-right corner.
Here we present a few examples from the Anywhere3D dataset via a data explorer.
To use the data explorer, first select from the available scenes in the selection bar. The visual grounding example will be demonstrated below. Click on the referring expression to visualize its ground-truth bounding box in the scene. Best viewed on monitors.
Control: Click + Drag = Rotate Ctrl + Drag = Translate Scroll Up/Down = Zoom In/Out
Results presented in Acc@0.25IoU on Anywhere3D-Bench.
Here we present a few qualitative results from Anywhere3D-Bench with Gemini-2.5-pro's reasoning processes.
Green bounding boxes represent ground-truth while red boxes represent Gemini-2.5-pro's prediction.
The error in reasoning process made by Gemini-2.5-pro is highlighted in bold.
Here we present a comparison between the best-performing non-thinking model(GPT-4.1) and the best performing thinking model(Gemini-2.5-pro) on Anywhere3D-Bench.
Green bounding boxes represent ground-truth while red boxes represent each model's prediction.
The error in reasoning process made by Gemini-2.5-pro is highlighted in bold.
We would like to especially thank ScanRefer for providing an excellent 3D annotation interface, which greatly facilitated the annotation process. We also appreciate the modifications made by SQA3D to the ScanRefer annotation interface. The annotation interface used in Anywhere3D was adapted from their well-designed interfaces. We are deeply grateful for their wonderful design and generous sharing with the community.
Also, we would like to thank the open source of the following projects:
We also wish to thank the numerous inspiring works on 3D visual grounding and spatial intelligence that have informed and motivated our research, though it is difficult to list all of them here.
@misc{anywhere3d,
title={From Objects to Anywhere: A Holistic Benchmark for Multi-level Visual Grounding in 3D Scenes},
author={Tianxu Wang and Zhuofan Zhang and Ziyu Zhu and Yue Fan and Jing Xiong and Pengxiang Li and Xiaojian Ma and Qing Li},
year={2025},
eprint={2506.04897},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2506.04897},
}