Sparse Growing Transformer: Training-Time Sparse Depth Allocation via Progressive Attention Looping Paper • 2603.23998 • Published Apr 16
Learning to Generate via Understanding: Understanding-Driven Intrinsic Rewarding for Unified Multimodal Models Paper • 2603.06043 • Published Mar 6
Blink: Dynamic Visual Token Resolution for Enhanced Multimodal Understanding Paper • 2512.10548 • Published 14 days ago
V-ITI: Mitigating Hallucinations in Multimodal Large Language Models via Visual Inference-Time Intervention Paper • 2512.03542 • Published Dec 3, 2025
Elastic MoE: Unlocking the Inference-Time Scalability of Mixture-of-Experts Paper • 2509.21892 • Published 26 days ago
Clip-Tuning: Towards Derivative-free Prompt Learning with a Mixture of Rewards Paper • 2210.12050 • Published Oct 21, 2022
Galileo at SemEval-2020 Task 12: Multi-lingual Learning for Offensive Language Identification using Pre-trained Language Models Paper • 2010.03542 • Published Oct 7, 2020
ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation Paper • 2107.02137 • Published Jul 5, 2021
ERNIE-Doc: A Retrospective Long-Document Modeling Transformer Paper • 2012.15688 • Published Dec 31, 2020 • 1
ERNIE-M: Enhanced Multilingual Representation by Aligning Cross-lingual Semantics with Monolingual Corpora Paper • 2012.15674 • Published Dec 31, 2020 • 1
ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation Paper • 2112.12731 • Published Dec 23, 2021 • 1
Dual Modalities of Text: Visual and Textual Generative Pre-training Paper • 2404.10710 • Published Apr 16, 2024 • 2
ERNIE-Code: Beyond English-Centric Cross-lingual Pretraining for Programming Languages Paper • 2212.06742 • Published Dec 13, 2022 • 4
Upcycling Instruction Tuning from Dense to Mixture-of-Experts via Parameter Merging Paper • 2410.01610 • Published Oct 2, 2024 • 1
Curiosity-Driven Reinforcement Learning from Human Feedback Paper • 2501.11463 • Published Jan 20, 2025
Inner Thinking Transformer: Leveraging Dynamic Depth Scaling to Foster Adaptive Internal Thinking Paper • 2502.13842 • Published Feb 19, 2025