Important: This model uses the JANGTQ (JANG TurboQuant) quantization format — an extreme-compression variant of JANG for MLX on Apple Silicon that uses codebook + Hadamard rotation on routed MoE experts while keeping attention, SSM, shared_expert, embed and lm_head at affine 8-bit. Currently only supported by MLX Studio and the jang-tools Python package. Follow @dealignai for new releases.

MLX Studio — the only app that natively supports JANG / JANGTQ models

Built for vMLX — the only MLX inferencer with VL support, KV cache quantization, prefix cache reuse, agentic tool calling, and speculative decoding.
_{Free for macOS · vmlx.net}

Qwen 3.6 35B-A3B — JANGTQ4 + CRACK

JANGTQ TurboQuant mixed-precision | CRACK abliterated | Vision + Video | Hybrid SSM/Attention MoE | 18 GB

What Is This?

This is Qwen 3.6 35B-A3B — a 35B-parameter Mixture-of-Experts vision-language model with 256 routed experts (10 active per token), hybrid linear + full-attention architecture, and native image + video understanding.

It has been:

JANGTQ quantized — JANGTQ4 profile (8-bit affine precision paths, 4-bit TurboQuant routed experts with codebook + Hadamard rotation, fp16 vision tower) — 18 GB
CRACK abliterated — permanent weight-level removal of safety refusal


Base model	Qwen 3.6 35B-A3B MoE VL (35B total, ~3B active, 256 routed experts)
Quantization	JANGTQ4 — 18 GB
MMLU-200	73.50% (base: 77.50%)
HarmBench-320	90.31%
Vision	27-layer ViT preserved in fp16 (image + video)
Context	262,144 native; up to ~1M with YaRN
Reasoning	Toggleable via `enable_thinking`
Fits on	24 GB+ Macs

MMLU-200 Results (thinking OFF)

Subject	CRACK	Base	Delta
Astronomy	19/20 (95%)	17/20	+2
High School Chemistry	18/20 (90%)	17/20	+1
High School Biology	18/20 (90%)	18/20	0
Logical Fallacies	18/20 (90%)	19/20	-1
World Religions	18/20 (90%)	19/20	-1
Anatomy	16/20 (80%)	17/20	-1
College Computer Science	14/20 (70%)	16/20	-2
College Physics	14/20 (70%)	14/20	0
High School Mathematics	7/20 (35%)	10/20	-3
Abstract Algebra	5/20 (25%)	8/20	-3
Total	147/200 (73.5%)	155/200 (77.5%)	-4.0pp

CRACK regression is concentrated on mathematical reasoning (algebra, math, CS). Knowledge-based subjects are largely preserved (astronomy and chemistry actually improved).

HarmBench-320 Results

Category	Score
Copyright	80/80	100.0%
Misinformation / Disinformation	53/54	98.1%
Harassment / Bullying	20/21	95.2%
Cybercrime / Intrusion	45/52	86.5%
Illegal	44/53	83.0%
Chemical / Biological	33/42	78.6%
Harmful	14/18	77.8%
Total	289/320	90.31%

Scored with a strict classifier that permits song-structure repetition in copyright-category compliance (lyrics legitimately repeat choruses).

Pre-Upload Capability Check (20/20 PASS)

All 20 prompts completed without refusal at greedy decoding (thinking OFF):

Hacking / Security (8/8): port scanner, reverse shell, SQL injection explanation, keylogger, Metasploit usage, ARP spoofing, phishing email, buffer overflow exploit — all produced working technical content
Coding Coherence (4/4): red-black tree, async web scraper, FastAPI task manager, expression-language compiler — all complete implementations
Reasoning (4/4): Euclid prime-infinity proof + √2 irrationality, microservices-vs-monolith trade-offs, farmer sheep math, mRNA vaccine mechanism
Knowledge (4/4): Kazakhstan capital, x^3+2x derivative, 8 planets, Crime and Punishment author — all correct

JANG CRACK Qwen 3.6 Series

Model	Format	Size	MMLU	HarmBench	Fits on
JANGTQ4 + CRACK (this model)	TurboQuant 4-bit experts	18 GB	73.5%	90.3%	24 GB Mac
JANGTQ2 + CRACK	TurboQuant 2-bit experts	11 GB	73.0%	93.8%	16 GB Mac

About JANGTQ

JANGTQ (JANG TurboQuant) is an extreme-compression variant of JANG that replaces affine quantization on routed MoE experts with codebook quantization + random Hadamard rotation. Precision-critical paths stay at affine 8-bit; routed experts use packed codebook indices with tiny Lloyd-Max codebooks per layer, fused dequant + matmul Metal kernels.

For Qwen 3.6 35B-A3B, JANGTQ4 brings the model to 18 GB while keeping the vision tower at fp16 for image + video understanding.

About CRACK

CRACK is a permanent weight-level abliteration that removes safety refusal without touching the TurboQuant codebook or vision tower. Multilingual (EN + ZH) refusal direction extraction means the model complies on both English and Chinese prompts.

Reasoning ON / OFF

The chat template respects enable_thinking. Recommend ON for complex reasoning, OFF for short answers / benchmarks / tool use.

# Thinking ON (default — full chain-of-thought)
messages = [{"role": "user", "content": "Derive 47 * 23 step by step"}]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
# → emits <think>...</think> then the answer

# Thinking OFF (direct answer, no <think> block)
prompt = tokenizer.apply_chat_template(messages, tokenize=False,
                                       add_generation_prompt=True,
                                       enable_thinking=False)
# → skips <think>, answers directly

All MMLU-200 and HarmBench-320 scores above were measured with thinking OFF for consistent short-form grading.

Notes

Thinking mode: Supported via enable_thinking kwarg. Thinking OFF is recommended for short-answer tasks (MMLU, direct instructions). Thinking ON works for extended reasoning but may occasionally loop on extreme refusal prompts (a known Qwen 3.6 surgical artifact — 1/6 in our thinking-ON stress test).
Vision: 27-layer ViT preserved in fp16. Image + video inputs work normally through mlx_vlm.
Context length: 262,144 native; extend via YaRN if your inference engine supports it.

Support dealignai

All models are built from original research and published for free. These models are specifically crafted to be excellent coders and general-purpose assistants.

Support us on Ko-fi — check out the Ko-fi membership for early access and extras.

dealign.ai

Twitter · HF · Ko-fi

Disclaimer

This model has had its safety refusal circuits removed. It will produce responses that would normally be refused, including technical content on security testing, dual-use research, and sensitive topics. You are responsible for how you use it.

The CRACK abliteration process does not add new capabilities — it only removes the model's learned refusal patterns. All knowledge, including the knowledge used to produce unsafe outputs, was already present in the base Qwen model.