SoCRATES: Towards Reliable Automated Evaluation of Proactive LLM Mediation across Domains and Socio-cognitive Variations Paper • 2606.05563 • Published 8 days ago • 49
BugPilot: Complex Bug Generation for Efficient Learning of SWE Skills Paper • 2510.19898 • Published Oct 22, 2025 • 3
RefineBench: Evaluating Refinement Capability of Language Models via Checklists Paper • 2511.22173 • Published Nov 27, 2025 • 15
Steering Autoregressive Music Generation with Recursive Feature Machines Paper • 2510.19127 • Published Oct 21, 2025 • 8