QUACK: Questioning, Understanding, and Auditing Communicated Knowledge in Multimodal Social Deduction Agents Paper • 2605.27068 • Published 9 days ago • 24
RubricBench: Aligning Model-Generated Rubrics with Human Standards Paper • 2603.01562 • Published Mar 2 • 63