Hao Peng's picture

Hao Peng

Wesleythu

·

h-peng17

AI & ML interests

None yet

Recent Activity

upvoted a paper 3 days ago

Reproducing, Analyzing, and Detecting Reward Hacking in Rubric-Based Reinforcement Learning

upvoted a paper 7 days ago

LongTraceRL: Learning Long-Context Reasoning from Search Agent Trajectories with Rubric Rewards

upvoted a paper 3 months ago

IndexCache: Accelerating Sparse Attention via Cross-Layer Index Reuse

View all activity

Organizations

upvoted a paper 3 days ago

Reproducing, Analyzing, and Detecting Reward Hacking in Rubric-Based Reinforcement Learning

Paper • 2606.04923 • Published 5 days ago • 37

upvoted a paper 7 days ago

LongTraceRL: Learning Long-Context Reasoning from Search Agent Trajectories with Rubric Rewards

Paper • 2605.31584 • Published 10 days ago • 41

upvoted a paper 3 months ago

IndexCache: Accelerating Sparse Attention via Cross-Layer Index Reuse

Paper • 2603.12201 • Published Mar 12 • 54

liked a dataset 3 months ago

Lossfunk/ISO-Bench

Viewer • Updated Feb 26 • 54 • 49 • 2

updated a collection 3 months ago

WildReward

Learning Reward Models from In-the-Wild Interactions • 4 items • Updated Mar 2 • 2

updated 2 models 3 months ago

THU-KEG/WildReward-8B

Text Classification • 8B • Updated Feb 26 • 12 • 3

THU-KEG/WildReward-4B

Text Classification • 4B • Updated Feb 26 • 18 • 4

liked a dataset 3 months ago

THU-KEG/WildFB

Updated Feb 26 • 36 • 3

updated a collection 3 months ago

WildReward

Learning Reward Models from In-the-Wild Interactions • 4 items • Updated Mar 2 • 2

updated a dataset 3 months ago

THU-KEG/WildFB

Updated Feb 26 • 36 • 3

published a dataset 3 months ago

THU-KEG/WildFB

Updated Feb 26 • 36 • 3

upvoted a paper 4 months ago

WildReward: Learning Reward Models from In-the-Wild Human Interactions

Paper • 2602.08829 • Published Feb 9 • 3

submitted a paper to Daily Papers 4 months ago

WildReward: Learning Reward Models from In-the-Wild Human Interactions

Paper • 2602.08829 • Published Feb 9 • 3

upvoted a collection 4 months ago

WildReward

Learning Reward Models from In-the-Wild Interactions • 4 items • Updated Mar 2 • 2

liked 2 models 4 months ago

THU-KEG/WildReward-8B

Text Classification • 8B • Updated Feb 26 • 12 • 3

THU-KEG/WildReward-4B

Text Classification • 4B • Updated Feb 26 • 18 • 4

updated a collection 4 months ago

WildReward

Learning Reward Models from In-the-Wild Interactions • 4 items • Updated Mar 2 • 2

upvoted a paper 5 months ago

Chaining the Evidence: Robust Reinforcement Learning for Deep Search Agents with Citation-Aware Rubric Rewards

Paper • 2601.06021 • Published Jan 9 • 48