IBM Granite 4.1
Collection
now with high IQ • 21 items • Updated • 4
Brainwaves
arc arc/e boolq hswag obkqa piqa wino
mxfp8 0.496,0.692,0.864,0.666,0.466,0.770,0.632
Quant Perplexity Peak Memory Tokens/sec
mxfp8 9.518 ± 0.094 11.75 GB 686
granite-4.1-8b
mxfp8 0.486,0.666,0.875,0.636,0.450,0.766,0.631
Quant Perplexity Peak Memory Tokens/sec
mxfp8 10.134 ± 0.107 12.17 GB 668
This model granite-4.1-8b-Abliterated-AND-Disinhibited-mxfp8-mlx was converted to MLX format from treadon/granite-4.1-8b-Abliterated-AND-Disinhibited using mlx-lm version 0.31.3.
pip install mlx-lm
from mlx_lm import load, generate
model, tokenizer = load("granite-4.1-8b-Abliterated-AND-Disinhibited-mxfp8-mlx")
prompt = "hello"
if tokenizer.chat_template is not None:
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
messages, add_generation_prompt=True, return_dict=False,
)
response = generate(model, tokenizer, prompt=prompt, verbose=True)
8-bit
Base model
ibm-granite/granite-4.1-8b