view article Article Simplifying Alignment: From RLHF to Direct Preference Optimization (DPO) ariG23498 • Jan 19, 2025 • 50
AIRS-Bench: a Suite of Tasks for Frontier AI Research Science Agents Paper • 2602.06855 • Published Feb 6 • 83