20 6 28

ManniX PRO

ManniX-ITA

https://github.com/mann1x

mann1x

AI & ML interests

None yet

Recent Activity

repliedto wenhuach's post 3 days ago

🚀 We provide **free** hardware to quantize models at the [Intel Low Bit Open LLM Leaderboard](https://huggingface.co/spaces/Intel/low_bit_open_llm_leaderboard), currently supporting `Pure RTN mode` powered by AutoRound ⭐ If you find it useful, please consider starring the AutoRound project on [GitHub](https://github.com/intel/auto-round)!

repliedto wenhuach's post 3 days ago

updated a model 3 days ago

ManniX-ITA/gemma-4-A4B-98e-v6-coder-it

View all activity

Organizations

None yet

Posts 7

Post

171

🚀 Gemma-4-A4B 98e v6-coder (C6v3lcb) — LCB-targeted code prune of Gemma 4 26B-A4B, 20.8B MoE (4B-active). Same C6 recipe as v5-coder, re-steered specifically at LiveCodeBench-medium — the one code bench pruning hurt most.

Not only keeps the lead on Python and closes the gap to 1-2pp in the other coding languages.

It's actually reasoning better, fixing the under-thinking and over-thinking failures of the full experts router.

All this comes with a cost with only 20b, on top of being very specific to coding; about 3x the thinking tokens in LiveCodeBench but it's good thinking that brings home not only more correct answers but in general a more precise and concise output.

📊 SCORES (Q6_K, llama.cpp, greedy, EVAL_PROTOCOL v3)

HumanEval 98.78 — HumanEval+ 93.29 — LCB-medium-55 v4 96.36
LCB-medium-100 96.00 — MultiPL-E macro 88.00 (Rust/Java/JS)
MATH-500 91.00 — GPQA-D 67.17 — AIME 63.33 — IFEval 92.00
vs v5-coder: +10.91 LCB-medium / +7.0 MultiPL-E / +10 AIME, HE+ tie

LCB targeting closed the −9.10pp hole and pushed +1.81pp past the unpruned 128e. Top of the 14–22B coder band: +9.2pp HE over Qwen2.5-Coder-14B-Instruct (89.6 → 98.78).

📦 GGUF SWEEP (all imatrix; Q4_K_M plain — imatrix hurt it)

Q6_K — 17.81 GB — 93.29% (cohort top)
Q3_K_M — 10.51 GB — 92.68% ⭐ value leader (imatrix lifted the 3-bit tiers hard)
IQ4_XS — 11.01 GB — 92.07% ⭐ safe 4-bit
IQ3_XS — 9.22 GB — 92.07% — smallest on the plateau
IQ2_S — 7.83 GB — 89.02% — sub-8 GB code-grade

⚔️ SAME-RIG vs Qwen2.5-Coder-14B (RTX 3090, greedy)

Iso-disk 10.5 GB: Q3_K_M 92.68 vs Qwen Q5_K_M 83.54 → +9.14pp at the same file size
LCB-medium-55 v4, identical split: 96.36 vs 18.18

bf16:
ManniX-ITA/gemma-4-A4B-98e-v6-coder-it ( ManniX-ITA/gemma-4-A4B-98e-v6-coder-it)
GGUF:
ManniX-ITA/gemma-4-A4B-98e-v6-coder-it-GGUF ( ManniX-ITA/gemma-4-A4B-98e-v6-coder-it-GGUF)
Ollama:
https://ollama.com/mannix/gemma4-98e-v6-coder

Post

233

🚀 Gemma-4-A4B 98e v5-coder — code-leaning 20.8B MoE (4B-active), C6 layer-relevance-weighted prune of Gemma 4 26B-A4B. Best 20B-class coder I've shipped.

📊 SCORES (NVFP4A16, vLLM 0.20.2, greedy, EVAL_PROTOCOL v3)

HumanEval 98.17 — HumanEval+ 92.68 — LCB-medium-55 v4 85.45
MATH-500 92.00 — GPQA-D 68.69 — IFEval 94.00
vs v4: +1.22 HE / +1.22 HE+ / +7.27 LCB-medium

Top of the 14–22B coder band: +8.6pp HE over Qwen2.5-Coder-14B-Instruct (89.6 → 98.17). HE+ sanity-audited — no memorization, no silent-empty.

📦 EXTENSIVE GGUF SWEEP (16 plain + IQ tiers + 5 CD recipes, all imatrix-calibrated)

Q8_0 — 21.16 GB — 93.90% (cohort top)
Q4_K_S — 12.21 GB — 93.29% ⭐ plain sweet spot
IQ4_XS — 11.01 GB — 93.29% ⭐ sub-12 GB top

⭐ TWO EXCELLENT SUB-10 GB CONTRIBDYNAMIC CD PICKS (per-layer + IQ-codebook overrides)

CD-IQ4_K_M (Canary W) — 10.29 GB — 92.07% — recommended sub-11 GB
CD-IQ3_XS_L — 9.27 GB — 90.24% — smallest viable code-grade

⚔️ SAME-RIG vs Qwen2.5-Coder-14B-Instruct (RTX 3090, greedy HE+)

11 GB band: v5-coder IQ4_XS wins +9.75pp at -1.49 bpw
12 GB band: Q4_K_S wins +8.53pp
8 GB band: IQ2_S wins +0.61pp at lower bpw

bf16:
ManniX-ITA/gemma-4-A4B-98e-v5-coder-it

GGUF:
ManniX-ITA/gemma-4-A4B-98e-v5-coder-it-GGUF

NVFP4A16:
ManniX-ITA/gemma-4-A4B-98e-v5-coder-NVFP4A16

Ollama:
https://ollama.com/mannix/gemma4-98e-v5-coder

———

🆕 BONUS — Qwen3.6-27B-Omnimerge-v4-MTP-GGUF

Same v4 weights with the native MTP head retained for llama.cpp speculative decoding (PR #22673, --spec-type draft-mtp). 7 imatrix tiers Q8_0 → IQ2_M.

HumanEval: 2.0x decode tok/s
MBPP: 2.33x decode tok/s
Both at +1-2pp pass@1 vs the non-MTP build. GPQA Diamond comparison in flight.

MTP-GGUF:
ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-MTP-GGUF

View all Posts

Collections 1

models 47

datasets 1

ManniX-ITA/osync-code

Viewer • Updated Jan 12 • 1 • 18