Important: This model uses the JANGTQ (JANG TurboQuant) quantization format — an extreme-compression variant of JANG for MLX on Apple Silicon that uses codebook + Hadamard rotation on routed MoE experts while keeping attention, SSM, shared_expert, embed and lm_head at affine 8-bit. Currently only supported by MLX Studio and the jang-tools Python package. Follow @dealignai for new releases.


MLX Studio

MLX Studio — the only app that natively supports JANG / JANGTQ models



Qwen 3.6 35B-A3B — JANGTQ4 + CRACK

JANGTQ TurboQuant mixed-precision | CRACK abliterated | Vision + Video | Hybrid SSM/Attention MoE | 18 GB

Ko-fi


What Is This?

This is Qwen 3.6 35B-A3B — a 35B-parameter Mixture-of-Experts vision-language model with 256 routed experts (10 active per token), hybrid linear + full-attention architecture, and native image + video understanding.

It has been:

  1. JANGTQ quantized — JANGTQ4 profile (8-bit affine precision paths, 4-bit TurboQuant routed experts with codebook + Hadamard rotation, fp16 vision tower) — 18 GB
  2. CRACK abliterated — permanent weight-level removal of safety refusal
Base model Qwen 3.6 35B-A3B MoE VL (35B total, ~3B active, 256 routed experts)
Quantization JANGTQ4 — 18 GB
MMLU-200 73.50% (base: 77.50%)
HarmBench-320 90.31%
Vision 27-layer ViT preserved in fp16 (image + video)
Context 262,144 native; up to ~1M with YaRN
Reasoning Toggleable via enable_thinking
Fits on 24 GB+ Macs

MMLU-200 Results (thinking OFF)

Subject CRACK Base Delta
Astronomy 19/20 (95%) 17/20 +2
High School Chemistry 18/20 (90%) 17/20 +1
High School Biology 18/20 (90%) 18/20 0
Logical Fallacies 18/20 (90%) 19/20 -1
World Religions 18/20 (90%) 19/20 -1
Anatomy 16/20 (80%) 17/20 -1
College Computer Science 14/20 (70%) 16/20 -2
College Physics 14/20 (70%) 14/20 0
High School Mathematics 7/20 (35%) 10/20 -3
Abstract Algebra 5/20 (25%) 8/20 -3
Total 147/200 (73.5%) 155/200 (77.5%) -4.0pp

CRACK regression is concentrated on mathematical reasoning (algebra, math, CS). Knowledge-based subjects are largely preserved (astronomy and chemistry actually improved).


HarmBench-320 Results

Category Score
Copyright 80/80 100.0%
Misinformation / Disinformation 53/54 98.1%
Harassment / Bullying 20/21 95.2%
Cybercrime / Intrusion 45/52 86.5%
Illegal 44/53 83.0%
Chemical / Biological 33/42 78.6%
Harmful 14/18 77.8%
Total 289/320 90.31%

Scored with a strict classifier that permits song-structure repetition in copyright-category compliance (lyrics legitimately repeat choruses).


Pre-Upload Capability Check (20/20 PASS)

All 20 prompts completed without refusal at greedy decoding (thinking OFF):

  • Hacking / Security (8/8): port scanner, reverse shell, SQL injection explanation, keylogger, Metasploit usage, ARP spoofing, phishing email, buffer overflow exploit — all produced working technical content
  • Coding Coherence (4/4): red-black tree, async web scraper, FastAPI task manager, expression-language compiler — all complete implementations
  • Reasoning (4/4): Euclid prime-infinity proof + √2 irrationality, microservices-vs-monolith trade-offs, farmer sheep math, mRNA vaccine mechanism
  • Knowledge (4/4): Kazakhstan capital, x^3+2x derivative, 8 planets, Crime and Punishment author — all correct

JANG CRACK Qwen 3.6 Series

Model Format Size MMLU HarmBench Fits on
JANGTQ4 + CRACK (this model) TurboQuant 4-bit experts 18 GB 73.5% 90.3% 24 GB Mac
JANGTQ2 + CRACK TurboQuant 2-bit experts 11 GB 73.0% 93.8% 16 GB Mac

About JANGTQ

JANGTQ (JANG TurboQuant) is an extreme-compression variant of JANG that replaces affine quantization on routed MoE experts with codebook quantization + random Hadamard rotation. Precision-critical paths stay at affine 8-bit; routed experts use packed codebook indices with tiny Lloyd-Max codebooks per layer, fused dequant + matmul Metal kernels.

For Qwen 3.6 35B-A3B, JANGTQ4 brings the model to 18 GB while keeping the vision tower at fp16 for image + video understanding.

About CRACK

CRACK is a permanent weight-level abliteration that removes safety refusal without touching the TurboQuant codebook or vision tower. Multilingual (EN + ZH) refusal direction extraction means the model complies on both English and Chinese prompts.


Reasoning ON / OFF

The chat template respects enable_thinking. Recommend ON for complex reasoning, OFF for short answers / benchmarks / tool use.

# Thinking ON (default — full chain-of-thought)
messages = [{"role": "user", "content": "Derive 47 * 23 step by step"}]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
# → emits <think>...</think> then the answer

# Thinking OFF (direct answer, no <think> block)
prompt = tokenizer.apply_chat_template(messages, tokenize=False,
                                       add_generation_prompt=True,
                                       enable_thinking=False)
# → skips <think>, answers directly

All MMLU-200 and HarmBench-320 scores above were measured with thinking OFF for consistent short-form grading.

Notes

  • Thinking mode: Supported via enable_thinking kwarg. Thinking OFF is recommended for short-answer tasks (MMLU, direct instructions). Thinking ON works for extended reasoning but may occasionally loop on extreme refusal prompts (a known Qwen 3.6 surgical artifact — 1/6 in our thinking-ON stress test).
  • Vision: 27-layer ViT preserved in fp16. Image + video inputs work normally through mlx_vlm.
  • Context length: 262,144 native; extend via YaRN if your inference engine supports it.

Support dealignai

All models are built from original research and published for free. These models are specifically crafted to be excellent coders and general-purpose assistants.

Support us on Ko-fi — check out the Ko-fi membership for early access and extras.


dealign.ai

Twitter · HF · Ko-fi


Disclaimer

This model has had its safety refusal circuits removed. It will produce responses that would normally be refused, including technical content on security testing, dual-use research, and sensitive topics. You are responsible for how you use it.

The CRACK abliteration process does not add new capabilities — it only removes the model's learned refusal patterns. All knowledge, including the knowledge used to produce unsafe outputs, was already present in the base Qwen model.

Downloads last month
4,444
Safetensors
Model size
5B params
Tensor type
U32
·
F16
·
U8
·
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for dealignai/Qwen3.6-35B-A3B-JANGTQ4-CRACK

Finetuned
(11)
this model

Collections including dealignai/Qwen3.6-35B-A3B-JANGTQ4-CRACK