lmms-lab/LLaVA-OneVision-1.5-4B-Instruct
Image-Text-to-Text • 5B • Updated • 3.62k • 18
Feeling and building the multimodal intelligence.
ParaVT: Taming the Tool Prior Paradox for Parallel Tool Use in Agentic Video Reinforcement Learning
Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling