TokenButler -- Predict token importance for all heads across the transformer in the first layer itself. Enable fine-grained token sparsity!
YASH AKHAURI
akhauriyash
AI & ML interests
None yet
Organizations
models 47
akhauriyash/DDR1_Q1.5B-GRPO-DACD
Updated
akhauriyash/DDR1_Q1.5B-DAPO
2B • Updated • 2
akhauriyash/DDR1_Q1.5B-GRPO-CompMath-DummyReward
2B • Updated • 3
akhauriyash/DDR1_Q1.5B-GRPO-CompMath
2B • Updated • 3
akhauriyash/DDR1_Q1.5B-GRPOFixReward
2B • Updated • 2
akhauriyash/DeepSeek-R1-Distill-Qwen-1.5B-E2EGRPO-OpenR1_Math_SpecR_GRPO_Mini-MiniSet
2B • Updated • 10
akhauriyash/RLM-GemmaS-Code-Amoeba-v0
0.2B • Updated • 1
akhauriyash/RLM-GemmaS-Code-PNAS-v0
0.2B • Updated • 1
akhauriyash/RLM-GemmaS-Code-DARTS-v0
0.2B • Updated • 2 • 1
akhauriyash/RLM-GemmaS-Code-v0
0.2B • Updated • 304 • 3
datasets 7
akhauriyash/Code-Regression
Viewer • Updated • 4.47M • 261 • 5
akhauriyash/GraphArch-Regression
Viewer • Updated • 171k • 106
akhauriyash/GraphNAS-Regression
Updated • 54
akhauriyash/OpenR1_Math_SplitReasoning
Viewer • Updated • 18.5k • 34
akhauriyash/OpenR1_Math_SpeculativeReasoning
Viewer • Updated • 18.5k • 89
akhauriyash/OpenR1_Math_SpecR_GRPO_Mini
Viewer • Updated • 500 • 41
akhauriyash/OpenR1_Math_SpecR_GRPO
Viewer • Updated • 5k • 20