Lens: Rethinking Training Efficiency for Foundational Text-to-Image Models Paper • 2605.21573 • Published 8 days ago • 97
Video2GUI: Synthesizing Large-Scale Interaction Trajectories for Generalized GUI Agent Pretraining Paper • 2605.14747 • Published 14 days ago • 143
Lance: Unified Multimodal Modeling by Multi-Task Synergy Paper • 2605.18678 • Published 10 days ago • 74
InsightTok: Improving Text and Face Fidelity in Discrete Tokenization for Autoregressive Image Generation Paper • 2605.14333 • Published 14 days ago • 34
HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds Paper • 2604.14268 • Published Apr 15 • 122
Seedance 2.0: Advancing Video Generation for World Complexity Paper • 2604.14148 • Published Apr 15 • 163
Strips as Tokens: Artist Mesh Generation with Native UV Segmentation Paper • 2604.09132 • Published Apr 10 • 55
PBR Material Collection 3D Asset Generation, Rendering, Inverse Rendering • 3 items • Updated 8 days ago
Omni123: Exploring 3D Native Foundation Models with Limited 3D Data by Unifying Text to 2D and 3D Generation Paper • 2604.02289 • Published Apr 2 • 13