Thinking with Imagination: Agentic Visual Spatial Reasoning with World Simulators Paper • 2606.06476 • Published 14 days ago • 15
Code-as-Room: Generating 3D Rooms from Top-Down View Images via Agentic Code Synthesis Paper • 2605.18451 • Published May 18 • 41
BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions Paper • 2308.09936 • Published Aug 19, 2023 • 1
Matryoshka Query Transformer for Large Vision-Language Models Paper • 2405.19315 • Published May 29, 2024 • 1