Post
171
š Gemma-4-A4B 98e v6-coder (C6v3lcb) ā LCB-targeted code prune of Gemma 4 26B-A4B, 20.8B MoE (4B-active). Same C6 recipe as v5-coder, re-steered specifically at LiveCodeBench-medium ā the one code bench pruning hurt most.
Not only keeps the lead on Python and closes the gap to 1-2pp in the other coding languages.
It's actually reasoning better, fixing the under-thinking and over-thinking failures of the full experts router.
All this comes with a cost with only 20b, on top of being very specific to coding; about 3x the thinking tokens in LiveCodeBench but it's good thinking that brings home not only more correct answers but in general a more precise and concise output.
š SCORES (Q6_K, llama.cpp, greedy, EVAL_PROTOCOL v3)
HumanEval 98.78 ā HumanEval+ 93.29 ā LCB-medium-55 v4 96.36
LCB-medium-100 96.00 ā MultiPL-E macro 88.00 (Rust/Java/JS)
MATH-500 91.00 ā GPQA-D 67.17 ā AIME 63.33 ā IFEval 92.00
vs v5-coder: +10.91 LCB-medium / +7.0 MultiPL-E / +10 AIME, HE+ tie
LCB targeting closed the ā9.10pp hole and pushed +1.81pp past the unpruned 128e. Top of the 14ā22B coder band: +9.2pp HE over Qwen2.5-Coder-14B-Instruct (89.6 ā 98.78).
š¦ GGUF SWEEP (all imatrix; Q4_K_M plain ā imatrix hurt it)
Q6_K ā 17.81 GB ā 93.29% (cohort top)
Q3_K_M ā 10.51 GB ā 92.68% ā value leader (imatrix lifted the 3-bit tiers hard)
IQ4_XS ā 11.01 GB ā 92.07% ā safe 4-bit
IQ3_XS ā 9.22 GB ā 92.07% ā smallest on the plateau
IQ2_S ā 7.83 GB ā 89.02% ā sub-8 GB code-grade
āļø SAME-RIG vs Qwen2.5-Coder-14B (RTX 3090, greedy)
Iso-disk 10.5 GB: Q3_K_M 92.68 vs Qwen Q5_K_M 83.54 ā +9.14pp at the same file size
LCB-medium-55 v4, identical split: 96.36 vs 18.18
bf16:
ManniX-ITA/gemma-4-A4B-98e-v6-coder-it ( ManniX-ITA/gemma-4-A4B-98e-v6-coder-it)
GGUF:
ManniX-ITA/gemma-4-A4B-98e-v6-coder-it-GGUF ( ManniX-ITA/gemma-4-A4B-98e-v6-coder-it-GGUF)
Ollama:
https://ollama.com/mannix/gemma4-98e-v6-coder
Not only keeps the lead on Python and closes the gap to 1-2pp in the other coding languages.
It's actually reasoning better, fixing the under-thinking and over-thinking failures of the full experts router.
All this comes with a cost with only 20b, on top of being very specific to coding; about 3x the thinking tokens in LiveCodeBench but it's good thinking that brings home not only more correct answers but in general a more precise and concise output.
š SCORES (Q6_K, llama.cpp, greedy, EVAL_PROTOCOL v3)
HumanEval 98.78 ā HumanEval+ 93.29 ā LCB-medium-55 v4 96.36
LCB-medium-100 96.00 ā MultiPL-E macro 88.00 (Rust/Java/JS)
MATH-500 91.00 ā GPQA-D 67.17 ā AIME 63.33 ā IFEval 92.00
vs v5-coder: +10.91 LCB-medium / +7.0 MultiPL-E / +10 AIME, HE+ tie
LCB targeting closed the ā9.10pp hole and pushed +1.81pp past the unpruned 128e. Top of the 14ā22B coder band: +9.2pp HE over Qwen2.5-Coder-14B-Instruct (89.6 ā 98.78).
š¦ GGUF SWEEP (all imatrix; Q4_K_M plain ā imatrix hurt it)
Q6_K ā 17.81 GB ā 93.29% (cohort top)
Q3_K_M ā 10.51 GB ā 92.68% ā value leader (imatrix lifted the 3-bit tiers hard)
IQ4_XS ā 11.01 GB ā 92.07% ā safe 4-bit
IQ3_XS ā 9.22 GB ā 92.07% ā smallest on the plateau
IQ2_S ā 7.83 GB ā 89.02% ā sub-8 GB code-grade
āļø SAME-RIG vs Qwen2.5-Coder-14B (RTX 3090, greedy)
Iso-disk 10.5 GB: Q3_K_M 92.68 vs Qwen Q5_K_M 83.54 ā +9.14pp at the same file size
LCB-medium-55 v4, identical split: 96.36 vs 18.18
bf16:
ManniX-ITA/gemma-4-A4B-98e-v6-coder-it ( ManniX-ITA/gemma-4-A4B-98e-v6-coder-it)
GGUF:
ManniX-ITA/gemma-4-A4B-98e-v6-coder-it-GGUF ( ManniX-ITA/gemma-4-A4B-98e-v6-coder-it-GGUF)
Ollama:
https://ollama.com/mannix/gemma4-98e-v6-coder