new

Get trending papers in your email inbox once a day!

Get trending papers in your email inbox!

Daily Papers

byAK and the research community

Jun 2

Submitted by

ssz1111

Crafter: A Multi-Agent Harness for Editable Scientific Figure Generation from Diverse Inputs

·
9 authors

Submitted by

anchen1011

On the Scaling of PEFT: Towards Million Personal Models of Trillion Parameters

mindlab-research

Submitted by

tomer-keren

A Matter of TASTE: Improving Coverage and Difficulty of Agent Benchmarks

Technion

Technion Israel institute of technology

Submitted by

seungone

K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts

CarnegieMellonU

Carnegie Mellon University

Submitted by

pat-jj

Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses

chromadb

Submitted by

Huang2020

Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding

SJTU

Shanghai Jiao Tong University

Submitted by

bingyang-lei

Draft-OPD: On-Policy Distillation for Speculative Draft Models

·
11 authors

Submitted by

aHapBean

NITP: Next Implicit Token Prediction for LLM Pre-training

Shanghai-Jiao-Tong-University-SAI

Shanghai Jiao Tong University SAI

Submitted by

KunH

Linear Ensembles Wash Away Watermarks: On the Fragility of Distributional Perturbations in LLMs

KingsCollegeLondon

King's College London

1

Submitted by

spw2000

X-Stream: Exploring MLLMs as Multiplexers for Multi-Stream Understanding

·
12 authors

Submitted by

Howe666

VLMs are Good Teachers for Video Reasoning via Adaptive Test-Time Optimization

KlingTeam

Submitted by

Z-MU-Z

Where to Look: Can Foundation Models Reach a Target Viewpoint Through Active Exploration?

zju

Zhejiang University

Submitted by

Hidir

VideoMLA: Low-Rank Latent KV Cache for Minute-Scale Autoregressive Video Diffusion

mayzovt

Submitted by

231sm

SkillAdaptor: Self-Adapting Skills for LLM Agents from Trajectories

antgroup

Submitted by

tianchez

Which Pretraining Paradigm Better Serves Spatial Intelligence? An Empirical Comparison of Vision-Language and Video Generation Models

omlab

Submitted by

IPF

Masking Stale Observations Helps Search Agents -- Until It Doesn't: A Regime Map and Its Mechanism

McAuley-Lab

Submitted by

tnlin

ESPO: Early-Stopping Proximal Policy Optimization

AlibabaTongyiLab

Submitted by

yokey

When Does Multi-Agent RL Improve LLM Workflows? Workflow, Scale, and Policy-Sharing Tradeoffs

OregonStateUniversity

Oregon State University

Submitted by

wwh0411

MCP-Persona: Benchmarking LLM Agents on Real-World Personal Applications via Environment Simulation

·
12 authors

Submitted by

gglorian

LVSA: Training-Free Sparse Attention for Long Video Diffusion

·
5 authors

Submitted by

monster119120

Joint Agent Memory and Exploration Learning via Novelty Signals

·
12 authors

Submitted by

AaronHuangWei

LongLive-RAG: A General Retrieval-Augmented Framework for Long Video Generation

nvidia

Submitted by

RomanBeliy

Brain-IT-VQA: From Brain Signals to Answers

weizmannscience

Weizmann Institute of Science

Submitted by

MrTaller

StreamChar: Long-Horizon Streaming Character Audio-Video Generation with Decoupled Orchestration

Wan-Video

Submitted by

ffjasonyu

Skill is Not One-Size-Fits-All: Model-Aware Skill Alignment for LLM Agents

·
6 authors

Submitted by

Ray2333

OpenWebRL: Demystifying Online Multi-turn Reinforcement Learning for Visual Web Agents

microsoft

Submitted by

yuyijiong

Speculative Pipeline Decoding: Higher-Accruacy and Zero-Bubble Speculation via Pipeline Parallelism

OregonStateUniversity

Oregon State University

Submitted by

ColinLu50

Policy and World Modeling Co-Training for Language Agents

1

Submitted by

Zhaoningw

AFUN: Towards an Affordance Foundation Model for Functionality Understanding

umich

University of Michigan

Submitted by

hhua2

Agent Skills Should Go Beyond Text: The Case for Visual Skills

·
4 authors

Submitted by

YUEVII

RoboStressBench: Benchmarking VLM Robustness to Physical Visual Stress in Embodied Scenes

RoboStressBench Team

Submitted by

jometeorieNUS

MineExplorer: Evaluating Open-World Exploration of MLLM Agents in Minecraft

·
10 authors

Submitted by

Alessiot

PARCEL: Pool-Anchored Resampling with Conditioned Elastic Queries for Efficient Vision-Language Understanding

google

Submitted by

AtoosaChegini

Off-the-Shelf LLMs as Process Scorers: Training-Free Alternative to PRMs for Mathematical Reasoning

UMCP

University of Maryland College Park

Submitted by

taesiri

Multi-Agent Computer Use

·
3 authors

Submitted by

VLyb

RoboSemanticBench: Diagnosing Semantic Grounding in Action Prediction for VLA Models

ZGCA

Zhongguancun Academy

Submitted by

odunkel

SOCO: Benchmarking Semantic Object Correspondence in Vision Foundation Models

GenIntelLab

Generative Intelligence Lab

Submitted by

jaeunglee

Measuring the Depth of LLM Unlearning via Activation Patching

·
3 authors

Submitted by

speed

HakushoBench: A Japanese Chart and Table VQA Benchmark from Governmental White Papers

llm-jp

1

Submitted by

JamesXZ

FineVerify: Scaling Test-Time Compute with Fine-Grained Self-Verification for Agentic Search

·
4 authors

Submitted by

yulupan

SVI-Bench: A Dynamic Microworld for Strategic Video Intelligence

UNC-ChapelHill

University of North Carolina at Chapel Hill

Submitted by

alibayram

Adapting Multilingual Embedding Models to Turkish via Cross-Lingual Tokenizer Surgery and Offline Distillation

magibu

Submitted by

JingHaoZ

Not only where, But when: Temporal Scheduling for RLVR

·
4 authors

Submitted by

barakor

Silent Failures in Physical AI: A Literature Review of Runtime Action Authorization for Autonomous Systems

STATE16

Submitted by

taesiri

3DCodeBench: Benchmarking Agentic Procedural 3D Modeling Via Code

deepmind

Submitted by

mengmengj

LongAttnComp: Cross-Family Context Compression for Long-Context Reasoning

sambanovasystems

1

Submitted by

adaamko

ACL-Verbatim: hallucination-free question answering for research

KRLabsOrg

Submitted by

Asukakoko

EVA01: Unified Native 3D Understanding and Generation via Mixture-of-Transformers

SEELE-AI

Submitted by

Cenji630

TVIR: Building Deep Research Agents Towards Text--Visual Interleaved Report Generation

NJU-LINK

Submitted by

adaface-neurips

Confidence-Adaptive SwiGLU for Mixture-of-Experts

·
7 authors

Submitted by

Chuanyang-Jin

MindZero: Learning Online Mental Reasoning With Zero Annotations

JohnsHopkins

Johns Hopkins University

Submitted by

taesiri

StressDream: Steering Video World Models for Robust Policy Evaluation and Improvement

·
9 authors

Submitted by

anzeameol

Compositional Text-to-Image Generation Via Region-aware Bimodal Direct Preference Optimization

·
4 authors

Submitted by

ethan-caballero

Unified Neural Scaling Laws

deepmind

Submitted by

barakor

Can Predicted Dynamics Exist in the Physical World?

STATE16

Submitted by

shashi-kumar

Geometric Latent Reasoning Induces Shorter Generations in LLMs

Idiap

Idiap Research Institute

Submitted by

taesiri

Thinking in Blender: Staged Executable Inverse Graphics with Vision-Language Models

·
4 authors

Submitted by

raphaelrrcoelho

A Formally Verified Library of Mathematical Finance in Lean 4

·
1 authors

Submitted by

psp-dada

ChartArena: Benchmarking Chart Parsing across Languages, Scenarios, and Formats

·
13 authors

Submitted by

jisx

Model-Based Quality Assessment for Massively Multilingual Parallel Data

MaLA-LM

Submitted by

rishitdagli

FreeForm: Reduced-Order Deformable Simulation from Particle-Based Skinning Eigenmodes

nvidia

Submitted by

yubol

Same Question, Different Source, Different Answer: Auditing Source-Dependence in Medical Multi-Source RAG

CarnegieMellonU

Carnegie Mellon University

Submitted by

yubol

The Chain Holds, the Answer Folds: Trace-Answer Dissociation in Reasoning Models Under Adversarial Pressure

CarnegieMellonU

Carnegie Mellon University

Submitted by

strich

Review Arcade: On the Human Alignment and Gameability of LLM Reviews

G4KMU

Hub of Computing and Data Science (HCDS) - G4KMU

Submitted by

mgor

AI, Take the Wheel: What Drives Delegation and Trust in Human-Computer Cooperative Question Answering?

qanta-challenge

Submitted by

jomaminoza

The Hamilton-Jacobi Theory of Deep Learning

cair-ph

Center for AI Research

Submitted by

ahan2000

Lost in Translation? Exploring the Shift in Grammatical Gender from Latin to Occitan

LMU

Ludwig Maximilian University of Munich

1

Submitted by

udbhavbamba

DOT-MoE: Differentiable Optimal Transport for MoEfication

·
5 authors

Submitted by

aboots

Semantic Motion Anchors: Bridging Motion and Meaning in Co-Speech Gestures

·
8 authors

Submitted by

gagan3012

Who Annotates in NLP? A Large-scale Assessment of Human Annotation Reporting between 2018 and 2025

nllg

Natural Language Learning & Generation Lab

1

Submitted by

jianlanluo

τ_0-WM: A Unified Video-Action World Model for Robotic Manipulation

·
20 authors

Submitted by

suraj-ranganath

Show, Don't TELL: Explainable AI-Generated Text Detection

·
2 authors