Learning Smooth Reward Models with Temporal Difference for LLM RL and Inference
Dan Zhang
zd21
AI & ML interests
None yet
Recent Activity
authored a paper about 9 hours ago
AndroidLab: Training and Systematic Benchmarking of Android Autonomous
Agents authored a paper about 9 hours ago
SPaR: Self-Play with Tree-Search Refinement to Improve
Instruction-Following in Large Language Models authored a paper about 9 hours ago
ZeroFlow: Overcoming Catastrophic Forgetting is Easier than You ThinkOrganizations
None yet