Submitted by taesiri 60 UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors · 11 authors 1
Submitted by jianlanluo 6 Learning while Deploying: Fleet-Scale Reinforcement Learning for Generalist Robot Policies · 16 authors 1
Submitted by unknowncloudw 5 From Skill Text to Skill Structure: The Scheduling-Structural-Logical Representation for Agent Skills Peking University 1
Submitted by taesiri 2 End-to-End Autoregressive Image Generation with 1D Semantic Tokenizer ByteDance Seed
Submitted by iieycx 1 Online Self-Calibration Against Hallucination in Vision-Language Models · 6 authors 1
Submitted by rajkumarrawal 1 Learning to Act and Cooperate for Distributed Black-Box Consensus Optimization · 4 authors 1
Submitted by praxelhq 1 LASE: Language-Adversarial Speaker Encoding for Indic Cross-Script Identity Preservation Praxel 0 1
Submitted by tobiaslee 1 AnalogRetriever: Learning Cross-Modal Representations for Analog Circuit Retrieval · 5 authors 1
Submitted by iNeil77 - Themis: Training Robust Multilingual Code Reward Models for Flexible Multi-Criteria Scoring Themis 0 1
Submitted by danielhzlin - Talker-T2AV: Joint Talking Audio-Video Generation with Autoregressive Diffusion Modeling · 11 authors 1